sivabalan narayanan created HUDI-4835:
-----------------------------------------

             Summary: Speed up reading of files from S3 in S3EventsIncrSource
                 Key: HUDI-4835
                 URL: https://issues.apache.org/jira/browse/HUDI-4835
             Project: Apache Hudi
          Issue Type: Improvement
          Components: deltastreamer
            Reporter: sivabalan narayanan


In S3EventsIncrSource, we load dataframe of N files using 
dataframeReader.load(files[]). we can improve the speed of reading S3 files by 
leveraging spark.parallelize().

 

Ref issue: https://github.com/apache/hudi/issues/5952



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to