[jira] [Commented] (HADOOP-16632) Partitioned S3A magic committers can leave pending files under __magic, maybe uploads

Steve Loughran (Jira) Fri, 04 Oct 2019 09:01:57 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-16632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16944624#comment-16944624
 ]


Steve Loughran commented on HADOOP-16632:
-----------------------------------------

This test is failing because the successful job should have deleted the __magic 
directory, but the assertions finds the output of an attempt. Which is 
potentially the sign of something going very wrong,

As it is -all is good, except for the cleanup, which is beyond our control.

For some reason this task was speculatively executed, and the second attempt 
`_0003_m_000008_1` was not committed before the job completed and the AM 
terminated. When the worker requested permission to commit, it couldn't connect 
over the (broken) RPC channel, retried a bit and gave up.
h3. task attempt_1570197469968_0003_m_000008_1
{code:java}
2019-10-04 15:02:34,237 INFO [main] 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTracker: File 
s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008
 is written as magic file to path 
fork-0001/test/ITestS3ACommitterMRJob-execute-magic/part-m-00008
2019-10-04 15:02:35,717 INFO [main] 
org.apache.hadoop.fs.s3a.commit.magic.MagicCommitTracker: Uncommitted data 
pending to file 
s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008;
 commit metadata for 1 parts in 
fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008.pending.
 sixe: 4690 byte(s)
2019-10-04 15:02:36,877 INFO [main] 
org.apache.hadoop.fs.s3a.S3ABlockOutputStream: File 
fork-0001/test/ITestS3ACommitterMRJob-execute-magic/part-m-00008 will be 
visible when the job is committed
2019-10-04 15:02:36,888 INFO [main] org.apache.hadoop.mapred.Task: 
Task:attempt_1570197469968_0003_m_000008_1 is done. And is in the process of 
committing
2019-10-04 15:02:36,889 INFO [main] 
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter: Starting: 
needsTaskCommit task attempt_1570197469968_0003_m_000008_1
2019-10-04 15:02:39,118 INFO [main] 
org.apache.hadoop.fs.s3a.commit.magic.MagicS3GuardCommitter: needsTaskCommit 
task attempt_1570197469968_0003_m_000008_1: duration 0:02.230s
2019-10-04 15:02:39,124 WARN [main] org.apache.hadoop.mapred.Task: Failure 
sending commit pending: java.io.EOFException: End of File Exception between 
local host is: "HW13176-2.local/192.168.1.6"; destination host is: 
"localhost":57166; : java.io.EOFException; For more details see:  
http://wiki.apache.org/hadoop/EOFException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:837)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:791)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1557)
        at org.apache.hadoop.ipc.Client.call(Client.java:1499)
        at org.apache.hadoop.ipc.Client.call(Client.java:1396)
        at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:251)
        at com.sun.proxy.$Proxy8.commitPending(Unknown Source)
        at org.apache.hadoop.mapred.Task.done(Task.java:1253)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:351)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:178)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1891)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:172)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1872)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1182)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1078)
{code}
h3. task attempt_1570197469968_0003_m_000008_1

And it looks like MR task failure is a no-warning System.exit(-1) call -the 
workers do not get a chance to do any cleanup themselves.
{code:java}
2019-10-04 15:03:39,315 WARN [communication thread] 
org.apache.hadoop.mapred.Task: Last retry, killing 
attempt_1570197469968_0003_m_000008_1
{code}
An another attempt at the same task had already been committed. Thus: the 
output of this attempt was not needed - all is good.
h3. AM
{code:java}
2019-10-04 15:02:29,487 [AsyncDispatcher event handler] INFO  
impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(1383)) - 
attempt_1570197469968_0003_m_000008_0 TaskAttempt Transitioned from 
COMMIT_PENDING to SUCCESS_FINISHING_CONTAINER
2019-10-04 15:02:29,488 [AsyncDispatcher event handler] INFO  impl.TaskImpl 
(TaskImpl.java:sendTaskSucceededEvents(759)) - Task succeeded with attempt 
attempt_1570197469968_0003_m_000008_0
2019-10-04 15:02:29,488 [AsyncDispatcher event handler] INFO  impl.TaskImpl 
(TaskImpl.java:transition(963)) - Issuing kill to other attempt 
attempt_1570197469968_0003_m_000008_1
2019-10-04 15:02:29,488 [AsyncDispatcher event handler] INFO  impl.TaskImpl 
(TaskImpl.java:handle(665)) - task_1570197469968_0003_m_000008 Task 
Transitioned from RUNNING to SUCCEEDED
2019-10-04 15:02:29,489 [AsyncDispatcher event handler] INFO  impl.JobImpl 
(JobImpl.java:transition(1979)) - Num completed Tasks: 9
2019-10-04 15:02:29,490 [AsyncDispatcher event handler] INFO  
impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(1383)) - 
attempt_1570197469968_0003_m_000008_1 TaskAttempt Transitioned from RUNNING to 
KILL_CONTAINER_CLEANUP
2019-10-04 15:02:29,490 [ContainerLauncher #1] INFO  
launcher.ContainerLauncherImpl (ContainerLauncherImpl.java:run(382)) - 
Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container 
container_1570197469968_0003_01_000014 taskAttempt 
attempt_1570197469968_0003_m_000008_1
2019-10-04 15:02:29,490 [ContainerLauncher #1] INFO  
launcher.ContainerLauncherImpl (ContainerLauncherImpl.java:kill(208)) - KILLING 
attempt_1570197469968_0003_m_000008_1
{code}
But it left its output directory. Which is interesting, as the 
MagicS3GuardCommitter task abort code did delete the directory,
h3. AM
{code:java}
2019-10-04 15:02:29,502 [AsyncDispatcher event handler] INFO  
impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(1383)) - 
attempt_1570197469968_0003_m_000008_1 TaskAttempt Transitioned from 
KILL_CONTAINER_CLEANUP to KILL_TASK_CLEANUP
2019-10-04 15:02:29,503 [CommitterEvent Processor #2] INFO  
commit.CommitterEventHandler (CommitterEventHandler.java:run(232)) - Processing 
the event EventType: TASK_ABORT
2019-10-04 15:02:29,504 [CommitterEvent Processor #2] INFO  
magic.MagicS3GuardCommitter (DurationInfo.java:<init>(72)) - Starting: Abort 
task attempt_1570197469968_0003_m_000008_1
2019-10-04 15:02:29,513 [IPC Server handler 25 on default port 57166] INFO  
mapred.TaskAttemptListenerImpl (TaskAttemptListenerImpl.java:statusUpdate(413)) 
- Progress of TaskAttempt attempt_1570197469968_0003_m_000009_0 is : 1.0
2019-10-04 15:02:29,516 [IPC Server handler 4 on default port 57166] INFO  
mapred.TaskAttemptListenerImpl (TaskAttemptListenerImpl.java:done(273)) - Done 
acknowledgment from attempt_1570197469968_0003_m_000009_0
2019-10-04 15:02:29,516 [AsyncDispatcher event handler] INFO  
impl.TaskAttemptImpl (TaskAttemptImpl.java:handle(1383)) - 
attempt_1570197469968_0003_m_000009_0 TaskAttempt Transitioned from 
COMMIT_PENDING to SUCCESS_FINISHING_CONTAINER
2019-10-04 15:02:29,517 [AsyncDispatcher event handler] INFO  impl.TaskImpl 
(TaskImpl.java:sendTaskSucceededEvents(759)) - Task succeeded with attempt 
attempt_1570197469968_0003_m_000009_0
2019-10-04 15:02:29,520 [AsyncDispatcher event handler] INFO  impl.TaskImpl 
(TaskImpl.java:handle(665)) - task_1570197469968_0003_m_000009 Task 
Transitioned from RUNNING to SUCCEEDED
2019-10
{code}
And in job teardown, everything under __magic is destroyed, which is logged via 
the DDB operations trace
{code:java}
2019-10-04 15:02:34,923 [CommitterEvent Processor #3] INFO  
commit.AbstractS3ACommitter (DurationInfo.java:close(87)) - Aborting all 
pending commits under 
s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic:
 duration 0:00.245s
2019-10-04 15:02:34,923 [CommitterEvent Processor #3] INFO  
commit.AbstractS3ACommitter (DurationInfo.java:close(87)) - Cleanup job 
job_1570197469968_0003: duration 0:00.245s
2019-10-04 15:02:34,924 [CommitterEvent Processor #3] INFO  
commit.AbstractS3ACommitter (HadoopExecutors.java:shutdown(118)) - Gracefully 
shutting down executor service. Waiting max 30 SECONDS
2019-10-04 15:02:34,924 [CommitterEvent Processor #3] INFO  
commit.AbstractS3ACommitter (HadoopExecutors.java:shutdown(129)) - Succesfully 
shutdown executor service
2019-10-04 15:02:34,925 [CommitterEvent Processor #3] INFO  
magic.MagicS3GuardCommitter (DurationInfo.java:<init>(72)) - Starting: Deleting 
magic directory 
s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic
2019-10-04 15:02:35,492 [s3a-transfer-shared-pool3-t3] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000000.pendingset
2019-10-04 15:02:35,492 [s3a-transfer-shared-pool3-t4] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000001.pendingset
2019-10-04 15:02:35,492 [s3a-transfer-shared-pool3-t5] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000002.pendingset
2019-10-04 15:02:35,493 [s3a-transfer-shared-pool3-t6] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000003.pendingset
2019-10-04 15:02:35,493 [s3a-transfer-shared-pool3-t7] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000004.pendingset
2019-10-04 15:02:35,493 [s3a-transfer-shared-pool3-t8] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000005.pendingset
2019-10-04 15:02:35,493 [s3a-transfer-shared-pool3-t9] DEBUG s3guard.Operations 
(DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000006.pendingset
2019-10-04 15:02:35,495 [s3a-transfer-shared-pool3-t10] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000007.pendingset
2019-10-04 15:02:35,496 [s3a-transfer-shared-pool3-t11] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000008.pendingset
2019-10-04 15:02:35,497 [s3a-transfer-shared-pool3-t12] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/task_1570197469968_0003_m_000009.pendingset
2019-10-04 15:02:35,497 [s3a-transfer-shared-pool3-t13] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1
2019-10-04 15:02:35,844 [s3a-transfer-shared-pool3-t14] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic
2019-10-04 15:02:35,913 [s3a-transfer-shared-pool3-t15] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001
2019-10-04 15:02:35,981 [s3a-transfer-shared-pool3-t16] DEBUG 
s3guard.Operations (DynamoDBMetadataStore.java:logPut(2432)) - #(Delete-0009) 
TOMBSTONE 
s3a:///hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks
2019-10-04 15:02:36,050 [CommitterEvent Processor #3] INFO  
magic.MagicS3GuardCommitter (DurationInfo.java:close(87)) - Deleting magic 
directory 
s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic:
 duration 0:01.125s
2019-10-04 15:02:36,058 [CommitterEvent Processor #3] INFO  
commit.AbstractS3ACommitter (DurationInfo.java:close(87)) - Task committer 
attempt_1570197469968_0003_m_000000_0: commitJob(job_1570197469968_0003): 
duration 0:05.989s
2019-10-04 15:02:36,107 [AsyncDispatcher event handler] INFO  impl.JobImpl 
(JobImpl.java:logJobHistoryFinishedEvent(1104)) - Calling handler for 
JobFinishedEvent 
2019-10-04 15:02:36,108 [AsyncDispatcher event handler] INFO  impl.JobImpl 
(JobImpl.java:handle(1020)) - job_1570197469968_0003Job Transitioned from 
COMMITTING to SUCCEEDED
{code}
So why is it there?

Because the job was completed (and did its cleanup) *before the task finished 
its write*. It was only when the write finished and the file closed that the 
.pending file came into existence … which is why the assertions found it.

We now have a trace of what happens when a task is partitioned and carries on 
its execution after the job completes. And it does what is required to do: none 
of its output becomes manifest in the destination. The .pending file wasn't 
uprated to the .pendingset file of all commits from that job, and of the 
pending uploads were not committed.

There is one possible (and literal) costs to this which is that if the pending 
uploads of the task are not aborted, then users will be billed for the 
outstanding data until it is explicitly terminated (hadoop s3guard uploads 
command; the config option to do this on s3a startup) all done implicitly by a 
life cycle rule to abort all pending uploads older than certain age (which is 
what we recommend). I did invoke {{s3guard uploads}} to see what it found -I 
couldn't see anything... But the tests may be doing some extra cleanup.

Anyway, I view this failure of the test as a chance to look at the logs and see 
what happens on a task partition. And yes we are good: no data from the 
uncommitted task was manifest in the final job. I am happy.

The only thing we could look at is: could we do more in the task to cleanup. 
That is-why didn't it get told to clean up its directory?

I don't think the MR protocol calls any worker-side abort operation before it 
exits: there is nothing we can do.

It will be interesting to see what the other execution engines do here.

> Partitioned S3A magic committers can leave pending files under __magic, maybe 
> uploads
> -------------------------------------------------------------------------------------
>
>                 Key: HADOOP-16632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16632
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.2.1, 3.1.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
>
> Partitioned S3A magic committers can leaving pending files, maybe upload data
> This surfaced in an assertion failure on a parallel test run.
> I thought it was actually a test failure, but with HADOOP-16207 all the docs 
> are preserved in the local FS and I can understand what happened.
> h3. Junit process
> {code}
> [INFO] 
> [ERROR] Failures: 
> [ERROR] 
> ITestS3ACommitterMRJob.test_200_execute:344->customPostExecutionValidation:356
>  Expected a java.io.FileNotFoundException to be thrown, but got the result: : 
> "Found magic dir which should have been deleted at 
> S3AFileStatus{path=s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic;
>  isDirectory=true; modification_time=0; access_time=0; owner=stevel; 
> group=stevel; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; 
> isEncrypted=true; isErasureCoded=false} isEmptyDirectory=UNKNOWN eTag=null 
> versionId=null
> [s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008
> s3a://hwdev-steve-ireland-new/fork-0001/test/ITestS3ACommitterMRJob-execute-magic/__magic/app-attempt-0001/tasks/attempt_1570197469968_0003_m_000008_1/__base/part-m-00008.pending
> {code}
> Full details to follow in the comment as they are, well, detailed.
>  
> Key point: AM-side job and task cleanup can happen before the worker task 
> finishes its writes. This will result in files under __magic. It may result 
> in pending uploads too -but only if the write began after the AM job cleanup 
> did a list + abort of all pending uploads under the destination directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-16632) Partitioned S3A magic committers can leave pending files under __magic, maybe uploads

Reply via email to