pvillard31 commented on PR #7893: URL: https://github.com/apache/nifi/pull/7893#issuecomment-1771591915
I did the below test: - Large Parquet file with about 50M records -> ConvertRecord (Parquet Reader / Parquet Writer) File size: 944MB Duration: 7'01 - Same file -> CalculateParquetOffsets (1M records split) -> same ConvertRecord with 8 concurrent tasks The CalculateParquetOffsets generates immediately 50 flow files (CLONE) Duration: 4'40 I was expecting the second case to be much faster given that we use 8 concurrent threads. Any idea? Thoughts? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
