Alexey Kudinkin created HUDI-5633:
-------------------------------------

             Summary: Fixing HoodieSparkRecord performance bottlenecks
                 Key: HUDI-5633
                 URL: https://issues.apache.org/jira/browse/HUDI-5633
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: Alexey Kudinkin
            Assignee: Alexey Kudinkin


There currently following issues w/ the current HoodieSparkRecord 
implementation:
 # It rewrites records using `rewriteRecord` and `rewriteRecordWithNewSchema` 
which do Schema traversals for every record. Instead we should do schema 
traversal only once and produce a transformer that will directly create new 
record from the old one.
 # Records are currently copied for every Executor even for Simple one which 
actually is not buffering any records and therefore doesn't require records to 
be copied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to