[ 
https://issues.apache.org/jira/browse/HIVE-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16041237#comment-16041237
 ] 

Marta Kuczora commented on HIVE-16845:
--------------------------------------

After investigating the issue I found that it is related to 
[HIVE-15114|https://issues.apache.org/jira/browse/HIVE-15114]

The exception in the *ConditionalResolverMergeFiles.generateActualTasks* method 
happens if there are partitions which should be merged and also there are 
partitions which shouldn't. The exception happens because for the partition 
which shouldn't be merged the move work contains null in the LoadFileWork 
variable. In this move work the LoadTableWork variable is set instead.
{noformat}
MoveWork mvWork = (MoveWork) mvTask.getWork();
LoadFileDesc lfd = mvWork.getLoadFileWork();
Path targetDir = lfd.getTargetDir();
{noformat}

After doing some more digging, here is a very short summary about my findings:
in the GenMapRedUtils.createMRWorkForMergingFiles method the "dummyMv" move 
work is created. In this move work the LoadFileDesc field is set. Then a 
conditional task is created and if the blobstore optimizations are enabled this 
conditional task won't contain the dummyMv task. It will be created from a move 
task which contain only the LoadTableWork. This move task will be passed to the 
ConditionalResolverMergeFiles and because its LoadFileWork variable is null, 
will cause NPE.

> INSERT OVERWRITE a table with dynamic partitions on S3 fails with NPE
> ---------------------------------------------------------------------
>
>                 Key: HIVE-16845
>                 URL: https://issues.apache.org/jira/browse/HIVE-16845
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.1.1
>            Reporter: Marta Kuczora
>            Assignee: Marta Kuczora
>
> *How to reproduce*
> - Create a partitioned table on S3:
> {noformat}
> CREATE EXTERNAL TABLE s3table(user_id string COMMENT '', event_name string 
> COMMENT '') PARTITIONED BY (reported_date string, product_id int) LOCATION 
> 's3a://<bucket name>'; 
> {noformat}
> - Create a temp table:
> {noformat}
> create table tmp_table (id string, name string, date string, pid int) row 
> format delimited fields terminated by '\t' lines terminated by '\n' stored as 
> textfile;
> {noformat}
> - Load the following rows to the tmp table:
> {noformat}
> u1    value1  2017-04-10      10000
> u2    value2  2017-04-10      10000
> u3    value3  2017-04-10      10001
> {noformat}
> - Set the following parameters:
> -- hive.exec.dynamic.partition.mode=nonstrict
> -- mapreduce.input.fileinputformat.split.maxsize=10
> -- hive.blobstore.optimizations.enabled=true
> -- hive.blobstore.use.blobstore.as.scratchdir=false
> -- hive.merge.mapfiles=true
> - Insert the rows from the temp table into the s3 table:
> {noformat}
> INSERT OVERWRITE TABLE s3table
> PARTITION (reported_date, product_id)
> SELECT
>   t.id as user_id,
>   t.name as event_name,
>   t.date as reported_date,
>   t.pid as product_id
> FROM tmp_table t;
> {noformat}
> A NPE will occur with the following stacktrace:
> {noformat}
> 2017-05-08 21:32:50,607 ERROR 
> org.apache.hive.service.cli.operation.Operation: 
> [HiveServer2-Background-Pool: Thread-184028]: Error running hive query: 
> org.apache.hive.service.cli.HiveSQLException: Error while processing 
> statement: FAILED: Execution Error, return code -101 from 
> org.apache.hadoop.hive.ql.exec.ConditionalTask. null
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:239)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:293)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> at 
> org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:306)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.generateActualTasks(ConditionalResolverMergeFiles.java:290)
> at 
> org.apache.hadoop.hive.ql.plan.ConditionalResolverMergeFiles.getTasks(ConditionalResolverMergeFiles.java:175)
> at 
> org.apache.hadoop.hive.ql.exec.ConditionalTask.execute(ConditionalTask.java:81)
> at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214)
> at 
> org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
> at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1977)
> at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1690)
> at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1422)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1206)
> at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1201)
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:237)
> ... 11 more 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to