[ 
https://issues.apache.org/jira/browse/HIVE-18412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16319248#comment-16319248
 ] 

Benjamin BONNET commented on HIVE-18412:
----------------------------------------

Hi [~ekoifman], 
in our use case, we have 2 tables : a raw table named "raw_table" that is 
partitionned by date, and an ACID table named "clean_table" containing the same 
columns as the ones in "raw_table".
"clean_table" has 3 buckets.
Then, we have a cleansing query that will delete from "clean_table" all rows 
that exist in a specified partition of "raw_table". Comparison is done on a 
combination of functional keys (key0,key1,key2 and key3).
That job is made using 3 reducers (enforced by a parameter set before executing 
the query).
Here is how it looks like :
{code}
set mapred.reduce.tasks=3;
DELETE
    FROM clean_table
    WHERE concat(CASE WHEN key0 IS NULL THEN '' ELSE CAST(key0 AS STRING) 
END,'#',CASE WHEN key1 IS NULL THEN '' ELSE CAST(key1 AS STRING) END,'#',CASE 
WHEN key2 IS NULL THEN '' ELSE CAST(key2 AS STRING) END,'#',CASE WHEN key3 IS 
NULL THEN '' ELSE CAST(key3 AS STRING) END)
        IN (
             SELECT concat(CASE WHEN mt.key0 IS NULL THEN '' ELSE CAST(mt.key0 
AS STRING) END,'#',CASE WHEN mt.key1 IS NULL THEN '' ELSE CAST(mt.key1 AS 
STRING) END,'#',CASE WHEN mt.key2 IS NULL THEN '' ELSE CAST(mt.key2 AS STRING) 
END,'#',CASE WHEN mt.key3 IS NULL THEN '' ELSE CAST(mt.key3 AS STRING) END)
             FROM clean_table clean
             LEFT SEMI JOIN(
                             SELECT concat(CASE WHEN key0 IS NULL THEN '' ELSE 
CAST(key0 AS STRING) END,'#',CASE WHEN key1 IS NULL THEN '' ELSE CAST(key1 AS
 STRING) END,'#',CASE WHEN key2 IS NULL THEN '' ELSE CAST(key2 AS STRING) 
END,'#',CASE WHEN key3 IS NULL THEN '' ELSE CAST(key3 AS STRING) END) AS key
                             FROM raw_table
                             WHERE (year='2017' AND month='01' AND day='01' AND 
INPUT__FILE__NAME like '%20170101%') AND 1=1) raw 
             ON concat(CASE WHEN clean.key0 IS NULL THEN '' ELSE 
CAST(clean.key0 AS STRING) END,'#',CASE WHEN clean.key1 IS NULL THEN '' ELSE 
CAST(clean.key1 AS STRING) END,'#',CASE WHEN clean.key2 IS NULL THEN '' ELSE 
CAST(clean.key2 AS STRING) END,'#',CASE WHEN clean.key3 IS NULL THEN '' ELSE 
CAST(clean.key3 AS STRING) END) = raw.key
            );
 {code}
Execution plan confirms a multifile sprayer is used to run that request.

> FileSinkOperator thows NullPointerException 
> --------------------------------------------
>
>                 Key: HIVE-18412
>                 URL: https://issues.apache.org/jira/browse/HIVE-18412
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive, Transactions
>         Environment: HDP2.6.1, Hive 1.2.1
>            Reporter: Benjamin BONNET
>            Priority: Blocker
>
> Hi,
> while executing a query (DELETE with a join) on an ACID table, I get a 
> NullPointerException in reducer.
> See stack trace below.
> According to FileSinkOperator source code, it seems that buckepMap transient 
> field is Null.
> In my opinion, the only circumstance in which this field may be null is when 
> the involved FileSinkOperator has been serialized and then deserialized. 
> Actually, deserialization lets that transient reference uninitialized.
> I checked source code for more recent versions (including Hive 2.x) but 
> everywhere that field may remain uninitialized (if FileSinkOperator is 
> serialized/deserialized). So I think that issue may concern any version of 
> Hive.
> ERROR : Vertex failed, vertexName=Reducer 3, 
> vertexId=vertex_1513704146031_77754_2_05, diagnostics=[Task failed, 
> taskId=task_1513704146031_77754_2_05_000000, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Failure while running task:java
> .lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{"
> _col0":"2017","_col1":"10"}}
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
>         at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:347)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:194)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:185)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:185)
>         at 
> org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:181)
>         at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{"_col0":"2017"
> ,"_col1":"10"}}
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:284)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:266)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
>         ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row (tag=0) 
> {"key":{"reducesinkkey0":{"transactionid":108117,"bucketid":0,"rowid":1114}},"value":{"_col0":"2017","_col1":"10"}}
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:352)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:274)
>         ... 16 more
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.findWriterOffset(FileSinkOperator.java:830)
>         at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:758)
>         at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:841)
>         at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
>         at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:343)
>         ... 17 more
> ], TaskAttempt 1 failed, info=[Error: Failure while running 
> task:java.lang.RuntimeException: java.lang.RuntimeException: .... etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to