danny0405 commented on PR #5436:
URL: https://github.com/apache/hudi/pull/5436#issuecomment-1111662836

   >We need to have different concerns for different table types that can use 
in different scenario. As i said above, we should focus more on query 
performance for COW tables, and write performance for MOR tables. 
   
   I don't think so, the write characteristics difference of COW and MOR table 
is a fact but that does not mean we should deteriorate current performance for 
both. Actually most of the streaming users use the COW table, they did expect a 
normal throughput of streaming ingestion. Double write is totally not 
acceptable for the latency and throughput, not to say the storage cost.
   
   >the write throughput is the main point for MOR. At most cases, we do not 
need to write out extra cdc files. The timing at which the CDC files has to be 
generated is when the MOR table will write out the base file, not log fil
   
   I don't think so, can you imagine a little the use scenarios of consuming 
the CDC change logs ? The uses expect a low end-to-end latency for ETL 
pipelining. Based on the real use case, you solution for MOR table is even not 
impractical.
   I would vote a totally -1 for this.
   
   > Hudi transaction is managed by timeline. Failure to write CDC files or 
data files should not complete the commit correctly
   
   You may need to read my concern again to answer my question.
   
   >The management about log files is as usual. Only CDC files, we need to 
consider to clean them in time by the clean service.
   
   You need to give more details here, actually i don't think the current 
cleaning service for data files should also take care of the CDC log files 
cleaning. that is too heavy for the component, at least we should make a new 
separate cleaning service to not mess up the existing cleaning strategy.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to