[jira] [Created] (HUDI-9019) Support writes using spark dataframe end to end

sivabalan narayanan (Jira) Thu, 13 Feb 2025 14:45:47 -0800

sivabalan narayanan created HUDI-9019:
-----------------------------------------


             Summary: Support writes using spark dataframe end to end 
                 Key: HUDI-9019
                 URL: https://issues.apache.org/jira/browse/HUDI-9019
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: sivabalan narayanan
            Assignee: sivabalan narayanan


We wanted to support writes using spark end to end using dataframe w/o 
converting them to avro record.

 

This opens up lot of opportunities for Hudi 
 * This will place Hudi close to direct parquet writes for straight forward 
immutable use-cases. Also for mutable use-cases, it will increase
 * For mutable use-cases, we are anticipating 10 to 20% improvement over rdd 
based write client impl. 
 * We can leverage spark optimizations which can kick in only with dataframe. 
 * Rapids, vectorized reading etc can speed up writes with Hudi once we move to 
end to end data frame writes. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (HUDI-9019) Support writes using spark dataframe end to end

Reply via email to