Alexey Kudinkin created HUDI-4249:
-------------------------------------
Summary: Fix in-memory HoodieData implementations to operate lazily
Key: HUDI-4249
URL: https://issues.apache.org/jira/browse/HUDI-4249
Project: Apache Hudi
Issue Type: Bug
Reporter: Alexey Kudinkin
Assignee: Alexey Kudinkin
Fix For: 0.12.0
Currently both `HoodieListData` and `HoodieMapPairData` operate eagerly on
their payloads meaning that each transformation is immediately applied.
This has following performance drawbacks:
# It always executes full transformation regardless of whether the whole
sequence will be required, potentially wasting quite a bit of compute.
# It also might be the cause of OOMs if the sequence potentially could be
larger than available memory (where caller might be relying on assumption that
it would be performing stream processing)
Instead it should be rebased to hold `Stream`s internally and provide semantic
close to Spark's RDD container.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)