Re: [PR] NIFI-12241 Efficient Parquet Splitting [nifi]

via GitHub Thu, 19 Oct 2023 12:35:59 -0700


pvillard31 commented on PR #7893:
URL: https://github.com/apache/nifi/pull/7893#issuecomment-1771591915


   I did the below test:
   
   - Large Parquet file with about 50M records -> ConvertRecord (Parquet Reader 
/ Parquet Writer)
   File size: 944MB
   Duration: 7'01
   
   - Same file -> CalculateParquetOffsets (1M records split) -> same 
ConvertRecord with 8 concurrent tasks
   The CalculateParquetOffsets generates immediately 50 flow files (CLONE)
   Duration: 4'40
   
   I was expecting the second case to be much faster given that we use 8 
concurrent threads. Any idea? Thoughts?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] NIFI-12241 Efficient Parquet Splitting [nifi]

Reply via email to