[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

ASF GitHub Bot (Jira) Thu, 29 Oct 2020 22:19:09 -0700


     [ 
https://issues.apache.org/jira/browse/HIVE-18284?focusedWorklogId=506534&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-506534
 ]


ASF GitHub Bot logged work on HIVE-18284:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 30/Oct/20 05:18
            Start Date: 30/Oct/20 05:18
    Worklog Time Spent: 10m 
      Work Description: shameersss1 commented on a change in pull request #1400:
URL: https://github.com/apache/hive/pull/1400#discussion_r514873327



##########
File path: itests/src/test/resources/testconfiguration.properties
##########
@@ -6,6 +6,7 @@ minimr.query.files=\
 
 # Queries ran by both MiniLlapLocal and MiniTez
 minitez.query.files.shared=\
+  dynpart_sort_optimization_distribute_by.q,\

Review comment:
       For some reason, The issue is not reproducible with LLAP, Hence running 
this with mini tez




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 506534)
    Time Spent: 2.5h  (was: 2h 20m)

> NPE when inserting data with 'distribute by' clause with dynpart sort 
> optimization
> ----------------------------------------------------------------------------------
>
>                 Key: HIVE-18284
>                 URL: https://issues.apache.org/jira/browse/HIVE-18284
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 3.0.0, 2.3.1, 2.3.2, 4.0.0, 3.1.1, 3.1.2
>            Reporter: Aki Tanaka
>            Assignee: Syed Shameerur Rahman
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> A Null Pointer Exception occurs when inserting data with 'distribute by' 
> clause. The following snippet query reproduces this issue:
> *(non-vectorized , non-llap mode)*
> {code:java}
> create table table1 (col1 string, datekey int);
> insert into table1 values ('ROW1', 1), ('ROW2', 2), ('ROW3', 1);
> create table table2 (col1 string) partitioned by (datekey int);
> set hive.vectorized.execution.enabled=false;
> set hive.optimize.sort.dynamic.partition=true;
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into table table2
> PARTITION(datekey)
> select col1,
> datekey
> from table1
> distribute by datekey ;
> {code}
> I could run the insert query without the error if I remove Distribute By  or 
> use Cluster By clause.
> It seems that the issue happens because Distribute By does not guarantee 
> clustering or sorting properties on the distributed keys.
> FileSinkOperator removes the previous fsp. FileSinkOperator will remove the 
> previous fsp which might be re-used when we use Distribute By.
> https://github.com/apache/hive/blob/branch-2.3/ql/src/java/org/apache/hadoop/hive/ql/exec/FileSinkOperator.java#L972
> The following stack trace is logged.
> {code:java}
> Vertex failed, vertexName=Reducer 2, vertexId=vertex_1513111717879_0056_1_01, 
> diagnostics=[Task failed, taskId=task_1513111717879_0056_1_01_000000, 
> diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( 
> failure ) : 
> attempt_1513111717879_0056_1_01_000000_0:java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:168)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>       at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{},"value":{"_col0":"ROW3","_col1":1}}
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:365)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:250)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:317)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>       ... 14 more
> Caused by: java.lang.NullPointerException
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:762)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:95)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:356)
>       ... 17 more
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Work logged] (HIVE-18284) NPE when inserting data with 'distribute by' clause with dynpart sort optimization

Reply via email to