Expression Abstractions [hudi]

via GitHub Wed, 12 Feb 2025 20:57:50 -0800


geserdugarov commented on PR #12795:
URL: https://github.com/apache/hudi/pull/12795#issuecomment-2655479129


   > > If we will eliminate unnecessary avro ser/de, append mode looks useless, 
we need only upsert and bulk insert write styles. Am I right?
   > 
   > Emm..I think append mode is not just all about performance issue, the key 
discrepancy is no deduplication, which is suitable for log data ingestion 
scenario.
   
   I think, @Alowator point is, that usual pipeline, the upper one on this 
schema:
   https://miro.com/app/board/uXjVLsCpf48=/
   could be used instead of separate "APPEND MODE" to increase code reuse.
   
   Namely, the block named as "SIMPLE, INMEMORY, RECORD_INDEX, HBASE, etc." 
could be used, but with skipped "index_bootstrap" in previous operators. And in 
`StreamWriteFunction` we could decide to call `deduplicateRecords`:
   
https://github.com/apache/hudi/blob/7cea766caf556821c875d507f05a3fabe117884a/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/StreamWriteFunction.java#L453
   or not dependent on operation type.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [HUDI-8966] RFC-88: New Schema/DataType/Expression Abstractions [hudi]

Reply via email to