[ 
https://issues.apache.org/jira/browse/TEZ-2587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14609265#comment-14609265
 ] 

Hitesh Shah edited comment on TEZ-2587 at 6/30/15 11:41 PM:
------------------------------------------------------------

For tez tasks within a yarn container, the object registry could be used to 
share data at a task level. This is not supported today but could be enhanced 
to add this. This is doable for yarn containers as only one task runs at a 
point in time. 

For LLAP like daemons, the context could be enhanced to provide a way to share 
more framework specific info via the ExecutionContext concept? 


was (Author: hitesh):
For tez tasks within a yarn container, the object registry could be used to 
share data at a task level. This is not supported today but could be enhanced 
to add this. 

For LLAP like daemons, the context could be enhanced to provide a way to share 
more framework specific info via the ExecutionContext concept? 

> Tez should provide attemptId (or some other ways of linking multiple threads 
> for the same task)
> -----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-2587
>                 URL: https://issues.apache.org/jira/browse/TEZ-2587
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Assignee: Siddharth Seth
>
> There are at least 2 threads calling Hive code for every task; thread #1
> {noformat}
>       at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:303)
>       at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:189)
>       at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:131)
>       at 
> org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(TezGroupedSplitsInputFormat.java:97)
>       at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(MRReaderMapred.java:152)
>       at 
> org.apache.tez.mapreduce.lib.MRReaderMapred.<init>(MRReaderMapred.java:73)
>       at 
> org.apache.tez.mapreduce.input.MultiMRInput.initFromEvent(MultiMRInput.java:177)
>       at 
> org.apache.tez.mapreduce.input.MultiMRInput.handleEvents(MultiMRInput.java:146)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.handleEvent(LogicalIOProcessorRuntimeTask.java:650)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.access$600(LogicalIOProcessorRuntimeTask.java:103)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask$1.runInternal(LogicalIOProcessorRuntimeTask.java:720)
>       at org.apache.tez.common.RunnableWithNdc.run(RunnableWithNdc.java:35)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Thread #2
> {noformat}
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:349)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:71)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:60)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:60)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:35)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Right now, there's no way for these threads to communicate with each other or 
> share data.
> While processor callee has access to some context objects and stuff, the 
> input thread doesn't have access to anything.
> Hive used globals to work around that, however this is both ugly, and no 
> longer works if multiple tasks run in the same process.
> There should be some way for the threads to talk... either IO thread should 
> have access to ProcessorContext somehow, or maybe both should have attemptId 
> added to the supplied conf. Perhaps it's possible to add a global method to 
> get ProcessorContext by attemptId then, or if not we can arrange our own ugly 
> globals by attemptId.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to