Alowator commented on code in PR #12960:
URL: https://github.com/apache/hudi/pull/12960#discussion_r1991281674


##########
rfc/rfc-87/rfc-87.md:
##########
@@ -24,9 +24,329 @@
 ## Approvers
 
 - @danny0405
-- @xiarixiaoyao
-- @yuzhaojing
+- @cshuo
 
 ## Status: Claim
 
-JIRA: [HUDI-8934](https://issues.apache.org/jira/browse/HUDI-8934)
+Umbrella ticket: [HUDI-9075](https://issues.apache.org/jira/browse/HUDI-9075)
+
+## Abstract
+
+Building on RFC-84, which removed Avro from Flink’s pre-write operators, 
RFC-87 eliminates Avro from the write path to improve performance.
+Current writes suffer from excessive Avro serialization/deserialization and 
in-memory storage of List<HoodieRecord>, causing high GC overhead.
+This RFC replaces DataBucket’s list storage with Flink’s 
BinaryInMemorySortBuffer, enabling efficient sorting and iterator-based writes.
+HoodieLogBlock is also refactored to separate deserialization from buffering.
+Precombine field deduplication will now occur after sorting.

Review Comment:
   From my benchmarks perf doesn't degrade, it was a kind of free cost sorting. 
I will retest it and provide benchmark results to came to decision. Perf of 
sorting depends on average record size. I will provide a result for tpc-h 
lineitem table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to