[ 
https://issues.apache.org/jira/browse/SPARK-26284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347496#comment-17347496
 ] 

Steve Loughran commented on SPARK-26284:
----------------------------------------

# s3:// URLs mean you are using EMR? If so: take it  up with them. Or, if you 
are using a very old version of hadoop, move up to a newer build with s3a. It 
still won't fix this problem, but you will be in supported code.

Now, please look at the comment above yours, and reread, especially the bit 
where I explain why this is WONTFIX. thx

> Spark History server object vs file storage behavior difference
> ---------------------------------------------------------------
>
>                 Key: SPARK-26284
>                 URL: https://issues.apache.org/jira/browse/SPARK-26284
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Damien Doucet-Girard
>            Priority: Minor
>
> I am using the spark history server in order to view running/complete jobs on 
> spark using the kubernetes scheduling backend introduced in 2.3.0. Using a 
> local file path in both {color:#333333}{{spark.eventLog.dir}}{color} and 
> {{spark.history.fs.logDirectory}}, I have no issue seeing both incomplete and 
> completed tasks, with {{.inprogress}} files being flushed regularly. However, 
> when using an {{s3a://}} path, it seems the calls to flush the file 
> ([https://github.com/apache/spark/blob/dd518a196c2d40ae48034b8b0950d1c8045c02ed/core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala#L152-L154)]
>  don't actually upload the file to s3. Due to this, I am unable to see 
> currently incomplete tasks using an s3a path.
> From the behavior I've observed, it only uploads on completion of the task 
> (hadoop 2.7) or upon the log file filling up the block size set for s3a 
> {{spark.hadoop.fs.s3a.multipart.size}} (hadoop 3.0.0). Is this intended 
> behavior?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to