sivabalan narayanan created HUDI-9019:
-----------------------------------------
Summary: Support writes using spark dataframe end to end
Key: HUDI-9019
URL: https://issues.apache.org/jira/browse/HUDI-9019
Project: Apache Hudi
Issue Type: Improvement
Reporter: sivabalan narayanan
Assignee: sivabalan narayanan
We wanted to support writes using spark end to end using dataframe w/o
converting them to avro record.
This opens up lot of opportunities for Hudi
* This will place Hudi close to direct parquet writes for straight forward
immutable use-cases. Also for mutable use-cases, it will increase
* For mutable use-cases, we are anticipating 10 to 20% improvement over rdd
based write client impl.
* We can leverage spark optimizations which can kick in only with dataframe.
* Rapids, vectorized reading etc can speed up writes with Hudi once we move to
end to end data frame writes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)