[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter

Venkatasubrahmanian Narayanan (Jira) Mon, 04 Mar 2024 10:24:04 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-19091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823282#comment-17823282
 ]


Venkatasubrahmanian Narayanan edited comment on HADOOP-19091 at 3/4/24 6:23 PM:
--------------------------------------------------------------------------------

[~srahman] While that's true, the problem is that the jobAttemptPath itself is 
different between task and AM. If you look at my previous message, you'll see 
the difference is in the "directory" name of the jobAttemptPath:

(spaces inserted below because Jira doesn't like double underscore)

Task: s3a://hive-east-1-bucket/emblembasic/ _ _ magic/ __ __ 
magic/job-job_17089738741890_0073/

vs

AM: s3a://hive-east-1-bucket/emblembasic/ _ _ magic/ _ _ 
magic/job-job_1708973874189_0073/

The extra 0 in the JobID causes them to look at different "directories", and 
hence it doesn't find it.

 

There isn't a stacktrace per se - the commitJob op just doesn't find any 
pending data to commit, so it just goes to the cleanup() code(where since my 
test are with Hadoop 3.3.3, it just looks under __magic and finds everything to 
be deleted).


was (Author: vnarayanan7):
[~srahman] While that's true, the problem is that the jobAttemptPath itself is 
different between task and AM. If you look at my previous message, you'll see 
the difference is in the "directory" name of the jobAttemptPath:

 

Task: 
s3a://hive-east-1-bucket/emblembasic/__magic/__magic/job-job_17089738741890_0073/

vs

AM: 
s3a://hive-east-1-bucket/emblembasic/_{_}_{_}magic/__magic/job-job_1708973874189_0073/

The extra 0 in the JobID causes them to look at different "directories", and 
hence it doesn't find it.

 

There isn't a stacktrace per se - the commitJob op just doesn't find any 
pending data to commit, so it just goes to the cleanup() code(where since my 
test are with Hadoop 3.3.3, it just looks under __magic and finds everything to 
be deleted).

> Add support for Tez to MagicS3GuardCommitter
> --------------------------------------------
>
>                 Key: HADOOP-19091
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19091
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: fs/s3
>    Affects Versions: 3.3.6
>         Environment: Pig 17/Hive 3.1.3 with Hadoop 3.3.3 on AWS EMR 6-12.0
>            Reporter: Venkatasubrahmanian Narayanan
>            Assignee: Venkatasubrahmanian Narayanan
>            Priority: Major
>         Attachments: 0001-AWS-Hive-Changes.patch, 
> 0002-HIVE-27698-Backport-of-HIVE-22398-Remove-legacy-code.patch, 
> HADOOP-19091-HIVE-WIP.patch
>
>
> The MagicS3GuardCommitter assumes that the JobID of the task is the same as 
> that of the job's application master when writing/reading the .pendingset 
> file. This assumption is not valid when running with Tez, which creates 
> slightly different JobIDs for tasks and the application master.
>  
> While the MagicS3GuardCommitter is intended only for MRv2, it mostly works 
> fine with an MRv1 wrapper with Hive/Pig (with some minor changes to Hive) run 
> in MR mode. This issue only crops up when running queries with the Tez 
> execution engine. I can upload a patch to Hive 3.1 to reproduce this error on 
> EMR if needed.
>  
> Fixing this will probably require work from both Tez and Hadoop, wanted to 
> start a discussion here so we can figure out how exactly we go about this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HADOOP-19091) Add support for Tez to MagicS3GuardCommitter

Reply via email to