[ 
https://issues.apache.org/jira/browse/HADOOP-16546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17290862#comment-17290862
 ] 

Steve Loughran commented on HADOOP-16546:
-----------------------------------------

Problem here is that the DTs are collected in job submit, and the committers 
don't get instantiated until setupJob, which happens on the AM/spark driver. We 
can't do this at all.

moot for magic committer; for staging we just have to tell people "use the 
cluster FS or add it as an extra path in the config options which allow that 
(spark has one for this; MR doesn't AFAIK)

> make sure staging committers collect DTs for the staging FS
> -----------------------------------------------------------
>
>                 Key: HADOOP-16546
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16546
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.0
>            Reporter: Steve Loughran
>            Priority: Major
>
> This is not a problem I've seen in the wild, but I've now encountered a 
> problem with hive doing something like this
> we need to (somehow) make sure that the staging committers collect DTs for 
> the staging dir FS. If this is the default FS or the same as a source or dest 
> FS, this is handled elsewhere, but otherwise we need to add the staging fs.
> I don;t see an easy way to do this, but we could add a new method to 
> PathOutputCommitter to collect DTs; FileOutputFormat can invoke this 
> alongside its ongoing collection of tokens for the output FS. Base impl would 
> be a no-op, obviously.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to