danny0405 commented on PR #5436: URL: https://github.com/apache/hudi/pull/5436#issuecomment-1111662836
>We need to have different concerns for different table types that can use in different scenario. As i said above, we should focus more on query performance for COW tables, and write performance for MOR tables. I don't think so, the write characteristics difference of COW and MOR table is a fact but that does not mean we should deteriorate current performance for both. Actually most of the streaming users use the COW table, they did expect a normal throughput of streaming ingestion. Double write is totally not acceptable for the latency and throughput, not to say the storage cost. >the write throughput is the main point for MOR. At most cases, we do not need to write out extra cdc files. The timing at which the CDC files has to be generated is when the MOR table will write out the base file, not log fil I don't think so, can you imagine a little the use scenarios of consuming the CDC change logs ? The uses expect a low end-to-end latency for ETL pipelining. Based on the real use case, you solution for MOR table is even not impractical. I would vote a totally -1 for this. > Hudi transaction is managed by timeline. Failure to write CDC files or data files should not complete the commit correctly You may need to read my concern again to answer my question. >The management about log files is as usual. Only CDC files, we need to consider to clean them in time by the clean service. You need to give more details here, actually i don't think the current cleaning service for data files should also take care of the CDC log files cleaning. that is too heavy for the component, at least we should make a new separate cleaning service to not mess up the existing cleaning strategy. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
