Re: Driver takes long time to finish once job ends
Cores and memory setting of driver ? On Wed, 23 Nov 2022, 12:56 Pralabh Kumar, wrote: > How many cores and u are running driver with? > > On Tue, 22 Nov 2022, 21:00 Nikhil Goyal, wrote: > >> Hi folks, >> We are running a job on our on prem cluster on K8s but writing the output >> to S3. We noticed that all the executors finish in < 1h but the driver >> takes another 5h to finish. Logs: >> >> 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on >> 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB) >> 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on >> 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB) >> 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job >> 4f0051fc-dda9-457f-a072-26311fd5e132 committed. >> 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write >> job 4f0051fc-dda9-457f-a072-26311fd5e132. >> 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: >> s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21 >> 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is >> only applicable to hdfs filesystem. >> 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it >> is only applicable to hdfs filesystem. >> 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client >> has been closed (this is expected if the application is shutting down.) >> 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: >> MapOutputTrackerMasterEndpoint stopped! >> 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared >> 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped >> 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped >> 22/11/22 07:20:22 INFO >> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: >> OutputCommitCoordinator stopped! >> 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext >> 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called >> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory >> /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82 >> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory >> /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959 >> 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics >> system... >> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system >> stopped. >> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system >> shutdown complete. >> >> Seems like the job is taking time to write to S3. Any idea how to fix this >> issue? >> >> Thanks >> >>
Re: Driver takes long time to finish once job ends
How many cores and u are running driver with? On Tue, 22 Nov 2022, 21:00 Nikhil Goyal, wrote: > Hi folks, > We are running a job on our on prem cluster on K8s but writing the output > to S3. We noticed that all the executors finish in < 1h but the driver > takes another 5h to finish. Logs: > > 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on > 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB) > 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on > 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB) > 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job > 4f0051fc-dda9-457f-a072-26311fd5e132 committed. > 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write > job 4f0051fc-dda9-457f-a072-26311fd5e132. > 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: > s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21 > 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is > only applicable to hdfs filesystem. > 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it > is only applicable to hdfs filesystem. > 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client > has been closed (this is expected if the application is shutting down.) > 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: > MapOutputTrackerMasterEndpoint stopped! > 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared > 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped > 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped > 22/11/22 07:20:22 INFO > OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext > 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called > 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory > /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82 > 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory > /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959 > 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics > system... > 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system > stopped. > 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system > shutdown complete. > > Seems like the job is taking time to write to S3. Any idea how to fix this > issue? > > Thanks > >
Re: EXT: Driver takes long time to finish once job ends
Hi Nikhil, You might be using the v1 file output commit protocol. http://www.openkb.info/2019/04/what-is-difference-between.html What is the difference between mapreduce.fileoutputcommitter.algorithm.version=1 and 2 | Open Knowledge Base - openkb.info<http://www.openkb.info/2019/04/what-is-difference-between.html> Goal: This article explains the difference between mapreduce.fileoutputcommitter.algorithm.version=1 and 2 using a sample wordcount job. Env: MapR 6.1 www.openkb.info Regards, Vibhor. From: Nikhil Goyal Sent: Tuesday, November 22, 2022 9:00 PM To: user @spark/'user @spark'/spark users/user@spark Subject: EXT: Driver takes long time to finish once job ends EXTERNAL: Report suspicious emails to Email Abuse. Hi folks, We are running a job on our on prem cluster on K8s but writing the output to S3. We noticed that all the executors finish in < 1h but the driver takes another 5h to finish. Logs: 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 10.42.145.11:39001<https://urldefense.com/v3/__http://10.42.145.11:39001__;!!IfjTnhH9!XrXMaxWv0cUMtb5c1JLoH9xwARR2Fgz3VyFWccsNJayocx5QBvWYMdmy3PS8wFVpIplMRzRKqCrHdgOyZ6jrlg$> in memory (size: 7.3 KiB, free: 9.4 GiB) 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 10.42.137.10:33425<https://urldefense.com/v3/__http://10.42.137.10:33425__;!!IfjTnhH9!XrXMaxWv0cUMtb5c1JLoH9xwARR2Fgz3VyFWccsNJayocx5QBvWYMdmy3PS8wFVpIplMRzRKqCrHdgP24LuGAw$> in memory (size: 7.3 KiB, free: 9.4 GiB) 22/11/22 04:57:46 INFO FileFormatWriter: Write Job 4f0051fc-dda9-457f-a072-26311fd5e132 committed. 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write job 4f0051fc-dda9-457f-a072-26311fd5e132. 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem. 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem. 22/11/22 07:20:20 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped 22/11/22 07:20:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system... 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system stopped. 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete. Seems like the job is taking time to write to S3. Any idea how to fix this issue? Thanks
Driver takes long time to finish once job ends
Hi folks, We are running a job on our on prem cluster on K8s but writing the output to S3. We noticed that all the executors finish in < 1h but the driver takes another 5h to finish. Logs: 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB) 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB) 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job 4f0051fc-dda9-457f-a072-26311fd5e132 committed. 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write job 4f0051fc-dda9-457f-a072-26311fd5e132. 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem. 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it is only applicable to hdfs filesystem. 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped 22/11/22 07:20:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system... 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system stopped. 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete. Seems like the job is taking time to write to S3. Any idea how to fix this issue? Thanks