[
https://issues.apache.org/jira/browse/HADOOP-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558627#comment-16558627
]
Steve Loughran commented on HADOOP-15634:
-----------------------------------------
#. EMR team gets to field their problems, as they've forked the code
# That said: yes. LocalDirAllocator uses local directories. They are used in
many places in Hadoop, often in different config options for different bits of
the code
# S3 is not a filesystem. You can't write directly to it, so the s3 connectors
all save data into blocks before upload. Where to? The local HDD via the
LocalDirAllocator. S3 isnt going to work
I'm afraid you will have to accept that when code wants a local FS directory it
means it and that you have to provide enough local storage for running
applications.
That said: if Tez isn't cleaning up, that's an issue to take up with the EMR
and Tez teams. For the latter, re-open this JIRA, move it to the Tez project.
But come up with some evidence that they don't clean up first.
Closing as INVALID. Sorry
> LocalDirAllocator using up local nonDFS when set to S3
> ------------------------------------------------------
>
> Key: HADOOP-15634
> URL: https://issues.apache.org/jira/browse/HADOOP-15634
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.3
> Environment: EMR-5.15, Hadoop-2.8.3, Hive-2.3.3, Tez-0.8.4, Beeline.
> Target table is defined for ACID transactions with location on S3.
> Insert source table is on S3.
> Reporter: Phani Kondapalli
> Priority: Blocker
>
> Manually modified the yarn-site.xml from within the EMR, set the param
> yarn.nodemanager.local-dirs to point to s3, reloaded the services on Master
> and Core nodes. Disk seemed to stay intact but hdfs dfsadmin -report showed
> nonDFS usage and then finally it failed with below error.
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing
> statement: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
> vertexId=vertex_1532581073633_0001_2_00, diagnostics=[Task failed,
> taskId=task_1532581073633_0001_2_00_000898, diagnostics=[TaskAttempt 0
> failed, info=[Error: Error while running task ( failure ) :
> attempt_1532581073633_0001_2_00_000898_0:org.apache.hadoop.util.DiskChecker$DiskErrorException:
> Could not find any valid local directory for
> output/attempt_1532581073633_0001_2_00_000898_0_10013_1/file.out
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
> at
> org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:207)
> at
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:545)
> ...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]