[ 
https://issues.apache.org/jira/browse/TEZ-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15511035#comment-15511035
 ] 

Hitesh Shah commented on TEZ-3439:
----------------------------------

[~hugotsao] Thanks for the patch. I think there are a bunch of bugs in the 
validation that need to be fixed. 

  - Consider the case where lhs is bigger than rhs. Today, the loop breaks out 
if rhs eof is hit instead of continuing to loop through lhs to count all extra 
keys in lhs
  - Next if rhs > lhs, then outside the loop shoudnt the check be to see if rhs 
is not yet at eof and loop through rhs to count all missing keys in lhs? 

Would you mind providing a patch that addresses the above instead of the 
current quick fix? 


> Tez joinvalidate example failed when first input argument size is bigger than 
> the second
> ----------------------------------------------------------------------------------------
>
>                 Key: TEZ-3439
>                 URL: https://issues.apache.org/jira/browse/TEZ-3439
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Hui Cao
>            Assignee: Hui Cao
>         Attachments: TEZ-3439.1.patch
>
>
> when using joinvalidate in Tez example jar. as command
> {{"hadoop jar tez-examples-<version>.jar joinvalidate <input1> <input2>"}}
> if the size of <input1> is bigger than <input2>, an IOException is thrown.
> {noformat}
> 16/09/21 00:07:53 INFO examples.JoinValidate: DAG diagnostics: [Vertex 
> failed, vertexName=joinvalidate, vertexId=vertex_1473073428528_0031_1_02, 
> diagnostics=[Task failed, taskId=task_1473073428528_0031_1_02_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : attempt_1473073428528_0031_1_02_000000_0:java.io.IOException: 
> Please check if you are invoking moveToNext() even after it returned false.
>       at 
> org.apache.tez.runtime.library.common.ValuesIterator.hasCompletedProcessing(ValuesIterator.java:221)
>       at 
> org.apache.tez.runtime.library.common.ValuesIterator.moveToNext(ValuesIterator.java:103)
>       at 
> org.apache.tez.runtime.library.input.OrderedGroupedKVInput$OrderedGroupedKeyValuesReader.next(OrderedGroupedKVInput.java:321)
>       at 
> org.apache.tez.examples.JoinValidate$JoinValidateProcessor.run(JoinValidate.java:254)
>       at 
> org.apache.tez.runtime.library.processor.SimpleProcessor.run(SimpleProcessor.java:53)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to