Samrat002 commented on PR #27026: URL: https://github.com/apache/flink/pull/27026#issuecomment-4777493675
From our internal experience, we initially started by upgrading Hadoop to 3.x and leveraging SDK v2 as you can see i have initiated the jira and shared first form of patch. Before we merge this patch, I think the following details need to be added for better understanding of the behaviour of the upgrade and what we are adding as transitive changes. 1. There good amount of Flink jobs where flink-s3-fs-presto and flink-s3-fs-hadoop is used. One for checkpointing and another for better performance for Filesystem writes. Have this patch tested for such scenarios? Is it possible for you, @ctrlaltdilj to add/share metrics of the TM JVM profile? 2. With the current implementation, two SDKs share one TaskManager JVM (Presto for checkpoints + Hadoop for the file sink deployment). JVM-level resource utilisation on long-running jobs has absert observations 3. @ctrlaltdilj Can you showcase the scenario where any upgrade/restore across the version boundary? Savepoint written by old (v1) build, restored by new (v2) 4. hadoop-aws 3.4 ships S3A prefetching / analytics-accelerator → changed read pattern, more memory per stream and different (often higher) GET volume → S3 cost surprises and new throttling for big table scans/batch reads. @snuyanzin @gaborgsomogyi @Poorvankbhatia (viz) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
