subject:"Driver takes long time to finish once job ends"

Re: Driver takes long time to finish once job ends

2022-11-22 Thread Pralabh Kumar

Cores and memory setting of driver ?

On Wed, 23 Nov 2022, 12:56 Pralabh Kumar,  wrote:

> How many cores and  u are running driver with?
>
> On Tue, 22 Nov 2022, 21:00 Nikhil Goyal,  wrote:
>
>> Hi folks,
>> We are running a job on our on prem cluster on K8s but writing the output
>> to S3. We noticed that all the executors finish in < 1h but the driver
>> takes another 5h to finish. Logs:
>>
>> 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
>> 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
>> 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
>> 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
>> 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job 
>> 4f0051fc-dda9-457f-a072-26311fd5e132 committed.
>> 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
>> job 4f0051fc-dda9-457f-a072-26311fd5e132.
>> 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
>> s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
>> 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
>> only applicable to hdfs filesystem.
>> 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it 
>> is only applicable to hdfs filesystem.
>> 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
>> has been closed (this is expected if the application is shutting down.)
>> 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
>> MapOutputTrackerMasterEndpoint stopped!
>> 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
>> 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
>> 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 22/11/22 07:20:22 INFO 
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
>> OutputCommitCoordinator stopped!
>> 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
>> /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
>> /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
>> system...
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
>> stopped.
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
>> shutdown complete.
>>
>> Seems like the job is taking time to write to S3. Any idea how to fix this 
>> issue?
>>
>> Thanks
>>
>>

Re: Driver takes long time to finish once job ends

2022-11-22 Thread Pralabh Kumar

How many cores and  u are running driver with?

On Tue, 22 Nov 2022, 21:00 Nikhil Goyal,  wrote:

> Hi folks,
> We are running a job on our on prem cluster on K8s but writing the output
> to S3. We noticed that all the executors finish in < 1h but the driver
> takes another 5h to finish. Logs:
>
> 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
> 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
> 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
> 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
> 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job 
> 4f0051fc-dda9-457f-a072-26311fd5e132 committed.
> 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
> job 4f0051fc-dda9-457f-a072-26311fd5e132.
> 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
> s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
> 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
> only applicable to hdfs filesystem.
> 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it 
> is only applicable to hdfs filesystem.
> 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
> has been closed (this is expected if the application is shutting down.)
> 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
> 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
> 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
> 22/11/22 07:20:22 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
> 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
> /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
> 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
> system...
> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
>
> Seems like the job is taking time to write to S3. Any idea how to fix this 
> issue?
>
> Thanks
>
>

Re: EXT: Driver takes long time to finish once job ends

2022-11-22 Thread Vibhor Gupta

Hi Nikhil,

You might be using the v1 file output commit protocol.
http://www.openkb.info/2019/04/what-is-difference-between.html
What is the difference between 
mapreduce.fileoutputcommitter.algorithm.version=1 and 2 | Open Knowledge Base - 
openkb.info<http://www.openkb.info/2019/04/what-is-difference-between.html>
Goal: This article explains the difference between 
mapreduce.fileoutputcommitter.algorithm.version=1 and 2 using a sample 
wordcount job. Env: MapR 6.1
www.openkb.info

Regards,
Vibhor.


From: Nikhil Goyal 
Sent: Tuesday, November 22, 2022 9:00 PM
To: user @spark/'user @spark'/spark users/user@spark 
Subject: EXT: Driver takes long time to finish once job ends

EXTERNAL: Report suspicious emails to Email Abuse.

Hi folks,
We are running a job on our on prem cluster on K8s but writing the output to 
S3. We noticed that all the executors finish in < 1h but the driver takes 
another 5h to finish. Logs:

22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
10.42.145.11:39001<https://urldefense.com/v3/__http://10.42.145.11:39001__;!!IfjTnhH9!XrXMaxWv0cUMtb5c1JLoH9xwARR2Fgz3VyFWccsNJayocx5QBvWYMdmy3PS8wFVpIplMRzRKqCrHdgOyZ6jrlg$>
 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
10.42.137.10:33425<https://urldefense.com/v3/__http://10.42.137.10:33425__;!!IfjTnhH9!XrXMaxWv0cUMtb5c1JLoH9xwARR2Fgz3VyFWccsNJayocx5QBvWYMdmy3PS8wFVpIplMRzRKqCrHdgP24LuGAw$>
 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 04:57:46 INFO FileFormatWriter: Write Job 
4f0051fc-dda9-457f-a072-26311fd5e132 committed.
22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
job 4f0051fc-dda9-457f-a072-26311fd5e132.
22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
only applicable to hdfs filesystem.
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
only applicable to hdfs filesystem.
22/11/22 07:20:20 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/22 07:20:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
system...
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
stopped.
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
shutdown complete.

Seems like the job is taking time to write to S3. Any idea how to fix this 
issue?

Thanks

Driver takes long time to finish once job ends

2022-11-22 Thread Nikhil Goyal

Hi folks,
We are running a job on our on prem cluster on K8s but writing the output
to S3. We noticed that all the executors finish in < 1h but the driver
takes another 5h to finish. Logs:

22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0
on 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 *04:57:46* INFO FileFormatWriter: Write Job
4f0051fc-dda9-457f-a072-26311fd5e132 committed.
22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for
write job 4f0051fc-dda9-457f-a072-26311fd5e132.
22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't
exist: 
s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim,
it is only applicable to hdfs filesystem.
22/11/22 *04:57:48* INFO SessionState: Could not get
hdfsEncryptionShim, it is only applicable to hdfs filesystem.
22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes
client has been closed (this is expected if the application is
shutting down.)
22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/22 07:20:22 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory
/tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory
/var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system
metrics system...
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics
system stopped.
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics
system shutdown complete.

Seems like the job is taking time to write to S3. Any idea how to fix
this issue?

Thanks

Re: Driver takes long time to finish once job ends

Re: Driver takes long time to finish once job ends

Re: EXT: Driver takes long time to finish once job ends

Driver takes long time to finish once job ends

4 matches

Site Navigation

Mail list logo

Footer information