stevenzwu commented on pull request #2867: URL: https://github.com/apache/iceberg/pull/2867#issuecomment-892178217
I share the same concern as @rdblue. It seems to me that this impl basically have a single committer task/thread that reads all all rows from a CombinedScanTask (files batched by BaseRewriteDataFilesAction) and writes them out. How is different to just configure the StreamFileWriter with parallelism of 1? if we make it a truly parallel rewrite/compaction action, I am a little concerned about the complexity we are adding to the Flink streaming ingestion path. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
