Alexey Kudinkin created HUDI-5633:
-------------------------------------
Summary: Fixing HoodieSparkRecord performance bottlenecks
Key: HUDI-5633
URL: https://issues.apache.org/jira/browse/HUDI-5633
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
There currently following issues w/ the current HoodieSparkRecord
implementation:
# It rewrites records using `rewriteRecord` and `rewriteRecordWithNewSchema`
which do Schema traversals for every record. Instead we should do schema
traversal only once and produce a transformer that will directly create new
record from the old one.
# Records are currently copied for every Executor even for Simple one which
actually is not buffering any records and therefore doesn't require records to
be copied.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)