Tarek Abouzeid created MAPREDUCE-7206:
-----------------------------------------
Summary: ShuffleHandler cannot access file.out
Key: MAPREDUCE-7206
URL: https://issues.apache.org/jira/browse/MAPREDUCE-7206
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 3.1.1
Environment: HDP 3.1 (3.1.0.0-78)
Reporter: Tarek Abouzeid
i am running HDP 3.1 (3.1.0.0-78) , i have 10 data nodes , Hive execution
engine is TEZ, when i run a query i get this error,
{code:java}
ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running, vertexName=Map
1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1,
vertexId=vertex_1557754551780_1091_2_00Vertex re-running, vertexName=Map 1,
vertexId=vertex_1557754551780_1091_2_00Vertex failed, vertexName=Map 1,
vertexId=vertex_1557754551780_1091_2_00, diagnostics=[Vertex
vertex_1557754551780_1091_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE,
Vertex vertex_1557754551780_1091_2_00 [Map 1] failed as task
task_1557754551780_1091_2_00_000001 failed after vertex succeeded.]DAG did not
succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
INFO : Completed executing
command(queryId=hive_20190516161715_09090e6d-e513-4fcc-9c96-0b48e9b43822); Time
taken: 17.935 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2
from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex re-running,
vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running,
vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex re-running,
vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00Vertex failed,
vertexName=Map 1, vertexId=vertex_1557754551780_1091_2_00, diagnostics=[Vertex
vertex_1557754551780_1091_2_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE,
Vertex vertex_1557754551780_1091_2_00 [Map 1] failed as task
task_1557754551780_1091_2_00_000001 failed after vertex succeeded.]DAG did not
succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
(state=08S01,code=2)
{code}
when i traced the logs, for example the application id is
*application_1557754551780_1091*
checked the node manager logs
{code:java}
2019-05-16 16:19:05,801 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
2019-05-16 16:19:05,818 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
2019-05-16 16:19:05,821 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
2019-05-16 16:19:05,822 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
2019-05-16 16:19:05,824 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
2019-05-16 16:19:05,826 INFO mapred.ShuffleHandler
(ShuffleHandler.java:sendMapOutput(1268)) -
/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003/file.out
not found
{code}
i checked the path where the output of the Map will be there in (
*/var/lib/hadoop/yarn/local/usercache/hive/appcache/application_1557754551780_1091/output/attempt_1557754551780_1091_2_00_000000_0_10003*
)
{code:java}
drwx--x---. 3 hive hadoop 16 May 16 16:16 filecache
drwxr-s---. 3 hive hadoop 60 May 16 16:16 output
{code}
inside the output :
{code:java}
-rw-------. 1 hive hadoop 28 May 16 16:17 file.out
-rw-r-----. 1 hive hadoop 32 May 16 16:17 file.out.index
{code}
so the *file.out* is not readable by other users in same group (switched to
yarn user and tried to open this file and got permission denied)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]