[
https://issues.apache.org/jira/browse/HADOOP-15634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558954#comment-16558954
]
Aaron Fabbri commented on HADOOP-15634:
---------------------------------------
{quote}why do we need local disk.
{quote}
For future reference: check out the
[documentation|https://hadoop.apache.org/docs/current3/hadoop-aws/tools/hadoop-aws/index.html]
for S3A. See the section "How S3A Writes to S3". (The open source S3A
connector also has support for in-memory buffering and the doc offers some
numbers for calculating how much memory this would require--which is the main
downside.) . As Steve said, this only applies to the Apache Hadoop code. EMR's
S3 client is not open sourced.
> LocalDirAllocator using up local nonDFS when set to S3
> ------------------------------------------------------
>
> Key: HADOOP-15634
> URL: https://issues.apache.org/jira/browse/HADOOP-15634
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 2.8.3
> Environment: EMR-5.15, Hadoop-2.8.3, Hive-2.3.3, Tez-0.8.4, Beeline.
> Target table is defined for ACID transactions with location on S3.
> Insert source table is on S3.
> Reporter: Phani Kondapalli
> Priority: Blocker
>
> Manually modified the yarn-site.xml from within the EMR, set the param
> yarn.nodemanager.local-dirs to point to s3, reloaded the services on Master
> and Core nodes. Disk seemed to stay intact but hdfs dfsadmin -report showed
> nonDFS usage and then finally it failed with below error.
> Error: org.apache.hive.service.cli.HiveSQLException: Error while processing
> statement: FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1,
> vertexId=vertex_1532581073633_0001_2_00, diagnostics=[Task failed,
> taskId=task_1532581073633_0001_2_00_000898, diagnostics=[TaskAttempt 0
> failed, info=[Error: Error while running task ( failure ) :
> attempt_1532581073633_0001_2_00_000898_0:org.apache.hadoop.util.DiskChecker$DiskErrorException:
> Could not find any valid local directory for
> output/attempt_1532581073633_0001_2_00_000898_0_10013_1/file.out
> at
> org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:441)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:151)
> at
> org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:132)
> at
> org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:207)
> at
> org.apache.tez.runtime.library.common.sort.impl.PipelinedSorter.spill(PipelinedSorter.java:545)
> ...
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]