Re: Driver takes long time to finish once job ends

2022-11-22 Thread Pralabh Kumar
Cores and memory setting of driver ?

On Wed, 23 Nov 2022, 12:56 Pralabh Kumar,  wrote:

> How many cores and  u are running driver with?
>
> On Tue, 22 Nov 2022, 21:00 Nikhil Goyal,  wrote:
>
>> Hi folks,
>> We are running a job on our on prem cluster on K8s but writing the output
>> to S3. We noticed that all the executors finish in < 1h but the driver
>> takes another 5h to finish. Logs:
>>
>> 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
>> 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
>> 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
>> 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
>> 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job 
>> 4f0051fc-dda9-457f-a072-26311fd5e132 committed.
>> 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
>> job 4f0051fc-dda9-457f-a072-26311fd5e132.
>> 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
>> s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
>> 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
>> only applicable to hdfs filesystem.
>> 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it 
>> is only applicable to hdfs filesystem.
>> 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
>> has been closed (this is expected if the application is shutting down.)
>> 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
>> MapOutputTrackerMasterEndpoint stopped!
>> 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
>> 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
>> 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
>> 22/11/22 07:20:22 INFO 
>> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
>> OutputCommitCoordinator stopped!
>> 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
>> /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
>> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
>> /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
>> system...
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
>> stopped.
>> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
>> shutdown complete.
>>
>> Seems like the job is taking time to write to S3. Any idea how to fix this 
>> issue?
>>
>> Thanks
>>
>>


Re: Driver takes long time to finish once job ends

2022-11-22 Thread Pralabh Kumar
How many cores and  u are running driver with?

On Tue, 22 Nov 2022, 21:00 Nikhil Goyal,  wrote:

> Hi folks,
> We are running a job on our on prem cluster on K8s but writing the output
> to S3. We noticed that all the executors finish in < 1h but the driver
> takes another 5h to finish. Logs:
>
> 22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
> 10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
> 22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
> 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
> 22/11/22 *04:57:46* INFO FileFormatWriter: Write Job 
> 4f0051fc-dda9-457f-a072-26311fd5e132 committed.
> 22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
> job 4f0051fc-dda9-457f-a072-26311fd5e132.
> 22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
> s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
> 22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
> only applicable to hdfs filesystem.
> 22/11/22 *04:57:48* INFO SessionState: Could not get hdfsEncryptionShim, it 
> is only applicable to hdfs filesystem.
> 22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes client 
> has been closed (this is expected if the application is shutting down.)
> 22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
> MapOutputTrackerMasterEndpoint stopped!
> 22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
> 22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
> 22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
> 22/11/22 07:20:22 INFO 
> OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
> OutputCommitCoordinator stopped!
> 22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
> 22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
> /tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
> 22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
> /var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
> 22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
> system...
> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete.
>
> Seems like the job is taking time to write to S3. Any idea how to fix this 
> issue?
>
> Thanks
>
>


Re: EXT: Driver takes long time to finish once job ends

2022-11-22 Thread Vibhor Gupta
Hi Nikhil,

You might be using the v1 file output commit protocol.
http://www.openkb.info/2019/04/what-is-difference-between.html
What is the difference between 
mapreduce.fileoutputcommitter.algorithm.version=1 and 2 | Open Knowledge Base - 
openkb.info
Goal: This article explains the difference between 
mapreduce.fileoutputcommitter.algorithm.version=1 and 2 using a sample 
wordcount job. Env: MapR 6.1
www.openkb.info

Regards,
Vibhor.


From: Nikhil Goyal 
Sent: Tuesday, November 22, 2022 9:00 PM
To: user @spark/'user @spark'/spark users/user@spark 
Subject: EXT: Driver takes long time to finish once job ends

EXTERNAL: Report suspicious emails to Email Abuse.

Hi folks,
We are running a job on our on prem cluster on K8s but writing the output to 
S3. We noticed that all the executors finish in < 1h but the driver takes 
another 5h to finish. Logs:

22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
10.42.145.11:39001
 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on 
10.42.137.10:33425
 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 04:57:46 INFO FileFormatWriter: Write Job 
4f0051fc-dda9-457f-a072-26311fd5e132 committed.
22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for write 
job 4f0051fc-dda9-457f-a072-26311fd5e132.
22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't exist: 
s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
only applicable to hdfs filesystem.
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim, it is 
only applicable to hdfs filesystem.
22/11/22 07:20:20 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/22 07:20:22 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory 
/var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system metrics 
system...
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
stopped.
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics system 
shutdown complete.

Seems like the job is taking time to write to S3. Any idea how to fix this 
issue?

Thanks


Driver takes long time to finish once job ends

2022-11-22 Thread Nikhil Goyal
Hi folks,
We are running a job on our on prem cluster on K8s but writing the output
to S3. We noticed that all the executors finish in < 1h but the driver
takes another 5h to finish. Logs:

22/11/22 02:08:29 INFO BlockManagerInfo: Removed broadcast_3_piece0 on
10.42.145.11:39001 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 *02:08:29* INFO BlockManagerInfo: Removed broadcast_3_piece0
on 10.42.137.10:33425 in memory (size: 7.3 KiB, free: 9.4 GiB)
22/11/22 *04:57:46* INFO FileFormatWriter: Write Job
4f0051fc-dda9-457f-a072-26311fd5e132 committed.
22/11/22 04:57:46 INFO FileFormatWriter: Finished processing stats for
write job 4f0051fc-dda9-457f-a072-26311fd5e132.
22/11/22 04:57:47 INFO FileUtils: Creating directory if it doesn't
exist: 
s3://rbx.usr/masked/dw_pii/creator_analytics_user_universe_first_playsession_dc_ngoyal/ds=2022-10-21
22/11/22 04:57:48 INFO SessionState: Could not get hdfsEncryptionShim,
it is only applicable to hdfs filesystem.
22/11/22 *04:57:48* INFO SessionState: Could not get
hdfsEncryptionShim, it is only applicable to hdfs filesystem.
22/11/22 *07:20:20* WARN ExecutorPodsWatchSnapshotSource: Kubernetes
client has been closed (this is expected if the application is
shutting down.)
22/11/22 07:20:22 INFO MapOutputTrackerMasterEndpoint:
MapOutputTrackerMasterEndpoint stopped!
22/11/22 07:20:22 INFO MemoryStore: MemoryStore cleared
22/11/22 07:20:22 INFO BlockManager: BlockManager stopped
22/11/22 07:20:22 INFO BlockManagerMaster: BlockManagerMaster stopped
22/11/22 07:20:22 INFO
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint:
OutputCommitCoordinator stopped!
22/11/22 07:20:22 INFO SparkContext: Successfully stopped SparkContext
22/11/22 07:20:22 INFO ShutdownHookManager: Shutdown hook called
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory
/tmp/spark-d9aa302f-86f2-4668-9c01-07b3e71cba82
22/11/22 07:20:22 INFO ShutdownHookManager: Deleting directory
/var/data/spark-5295849e-a0f3-4355-9a6a-b510616aefaa/spark-43772336-8c86-4e2b-839e-97b2442b2959
22/11/22 07:20:22 INFO MetricsSystemImpl: Stopping s3a-file-system
metrics system...
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics
system stopped.
22/11/22 07:20:22 INFO MetricsSystemImpl: s3a-file-system metrics
system shutdown complete.

Seems like the job is taking time to write to S3. Any idea how to fix
this issue?

Thanks


RE: Re: [Spark Sql] Global Setting for Case-Insensitive String Compare

2022-11-22 Thread Patrick Tucci

Thanks. How would I go about formally submitting a feature request for this?

On 2022/11/21 23:47:16 Andrew Melo wrote:
> I think this is the right place, just a hard question :) As far as I
> know, there's no "case insensitive flag", so YMMV
>
> On Mon, Nov 21, 2022 at 5:40 PM Patrick Tucci  wrote:
> >
> > Is this the wrong list for this type of question?
> >
> > On 2022/11/12 16:34:48 Patrick Tucci wrote:
> > > Hello,
> > >
> > > Is there a way to set string comparisons to be case-insensitive
> > globally? I
> > > understand LOWER() can be used, but my codebase contains 27k 
lines of SQL
> > > and many string comparisons. I would need to apply LOWER() to 
each string
> > > literal in the code base. I'd also need to change all the 
ETL/import code

> > > to apply LOWER() to each string value on import.
> > >
> > > Current behavior:
> > >
> > > SELECT 'ABC' = 'abc';
> > > false
> > > Time taken: 5.466 seconds, Fetched 1 row(s)
> > >
> > > SELECT 'ABC' IN ('AbC', 'abc');
> > > false
> > > Time taken: 5.498 seconds, Fetched 1 row(s)
> > >
> > > SELECT 'ABC' like 'Ab%'
> > > false
> > > Time taken: 5.439 seconds, Fetched 1 row(s)
> > >
> > > Desired behavior would be true for all of the above with the proposed
> > > case-insensitive flag set.
> > >
> > > Thanks,
> > >
> > > Patrick
> > >
> >
> > -
> > To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org