Chris Drome created HIVE-14344:
----------------------------------

             Summary: Intermittent failures caused by leaking delegation tokens
                 Key: HIVE-14344
                 URL: https://issues.apache.org/jira/browse/HIVE-14344
             Project: Hive
          Issue Type: Bug
          Components: Tez
    Affects Versions: 2.1.0, 1.2.1
            Reporter: Chris Drome
            Assignee: Chris Drome


We have experienced random job failures caused by leaking delegation tokens. 
The Tez child task will fail because it is attempting to read from the 
delegation tokens directory of a different (related) task.

Failure results in the following type of stack trace:

{noformat}
2016-07-21 16:57:18,061 [FATAL] [TezChild] |tez.ReduceRecordSource|: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) 
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:370)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:292)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:249)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:148)
        at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:137)
        at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:362)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:179)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:171)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1738)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:171)
        at 
org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:167)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.RuntimeException: java.io.IOException: Exception reading 
file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens
        at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:237)
        at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:74)
        at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.genUniqueJoinObject(CommonJoinOperator.java:650)
        at 
org.apache.hadoop.hive.ql.exec.CommonJoinOperator.checkAndGenObject(CommonJoinOperator.java:756)
        at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinObject(CommonMergeJoinOperator.java:316)
        at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:279)
        at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.joinOneGroup(CommonMergeJoinOperator.java:272)
        at 
org.apache.hadoop.hive.ql.exec.CommonMergeJoinOperator.process(CommonMergeJoinOperator.java:258)
        at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:361)
        ... 17 more
Caused by: java.lang.RuntimeException: java.io.IOException: Exception reading 
file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens
        at 
org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:141)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:119)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
        at 
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
        at 
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:45)
        at 
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at 
org.apache.hadoop.hive.ql.exec.persistence.RowContainer.first(RowContainer.java:222)
        ... 25 more
Caused by: java.io.IOException: Exception reading 
file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens
        at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:175)
        at 
org.apache.hadoop.mapreduce.security.TokenCache.mergeBinaryTokens(TokenCache.java:136)
        ... 32 more
Caused by: java.io.FileNotFoundException: File 
file:/grid/4/tmp/yarn-local/usercache/.../appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens
 does not exist
        at 
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
        at 
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
        at 
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
        at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:142)
        at 
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:346)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:768)
        at 
org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:170)
        ... 33 more
{noformat}

The application that failed was {{application_1468602386465_489844}} while 
complaining about 
{{appcache/application_1468602386465_489814/container_e02_1468602386465_489814_01_000001/container_tokens}}.

This seems to only manifest via HiveAction through Oozie.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to