[ 
https://issues.apache.org/jira/browse/HADOOP-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558627#comment-16558627
 ] 

Steve Loughran commented on HADOOP-15634:
-----------------------------------------

#. EMR team gets to field their problems, as they've forked the code
# That said: yes. LocalDirAllocator uses local directories. They are used in 
many places in Hadoop, often in different config options for different bits of 
the code
# S3 is not a filesystem. You can't write directly to it, so the s3 connectors 
all save data into blocks before upload. Where to? The local HDD via the 
LocalDirAllocator. S3 isnt going to work

I'm afraid you will have to accept that when code wants a local FS directory it 
means it and that you have to provide enough local storage for running 
applications.

That said: if Tez isn't cleaning up, that's an issue to take up with the EMR 
and Tez teams. For the latter, re-open this JIRA, move it to the Tez project. 
But come up with some evidence that they don't clean up first.

Closing as INVALID. Sorry



> LocalDirAllocator using up local nonDFS when set to S3
> ------------------------------------------------------
>
>                 Key: HADOOP-15634
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15634
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>    Affects Versions: 2.8.3
>         Environment: EMR-5.15, Hadoop-2.8.3, Hive-2.3.3, Tez-0.8.4, Beeline. 
> Target table is defined for ACID transactions with location on S3. 
> Insert source table is on S3. 
>            Reporter: Phani Kondapalli
>            Priority: Blocker
>
> Manually modified the yarn-site.xml from within the EMR, set the param 
> yarn.nodemanager.local-dirs to point to s3, reloaded the services on Master 
> and Core nodes. Disk seemed to stay intact but hdfs dfsadmin -report showed 
> nonDFS usage and then finally it failed with below error.
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1532581073633_0001_2_00, diagnostics=[Task failed, 
> taskId=task_1532581073633_0001_2_00_000898, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1532581073633_0001_2_00_000898_0:org.apache.hadoop.util.DiskChecker$DiskErrorException:
>  Could not find any valid local directory for 
> output/attempt_1532581073633_0001_2_00_000898_0_10013_1/file.out
>  at 
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
>  at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
>  at 
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
>  at 
> org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:207)
>  at 
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:545)
> ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to