[ 
https://issues.apache.org/jira/browse/DRILL-3579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venki Korukanti updated DRILL-3579:
-----------------------------------
    Attachment: DRILL-3579-1.patch

Handle null (also known as default) partitions in Hive storage plugin.

Patch also includes:
1) Currently the code that interprets partition values in string format to 
appropriate type is duplicated in HiveRecordReader and HivePartitionDescriptor. 
Refactor the code into common place HiveUtilities.

2) Add tests to test deserialization of partitions of all supported types.

> Drill on Hive query fails if partition table has __HIVE_DEFAULT_PARTITION__
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-3579
>                 URL: https://issues.apache.org/jira/browse/DRILL-3579
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Hive
>    Affects Versions: 1.1.0
>         Environment: Drill 1.1 on Hive 1.0
>            Reporter: Hao Zhu
>            Assignee: Venki Korukanti
>            Priority: Critical
>             Fix For: 1.2.0
>
>         Attachments: DRILL-3579-1.patch
>
>
> If Hive's partition table has __HIVE_DEFAULT_PARTITION__ in the case of null 
> values in the partition column, Drill on Hive query will fail.
> Minimum reproduce:
> 1.Hive:
> {code}
> CREATE TABLE h1_testpart2(id INT) PARTITIONED BY(id2 int);
> set hive.exec.dynamic.partition.mode=nonstrict;
> INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , 20150101 
> as id2 from h1_passwords limit 1;
> INSERT OVERWRITE TABLE h1_testpart2 PARTITION(id2) SELECT 1 as id1 , null as 
> id2 from h1_passwords limit 1;
> {code}
> 2. Filesystem looks like:
> {code}
> h1 h1_testpart2]# ls -altr
> total 2
> drwxrwxrwx 89 mapr mapr 87 Jul 30 00:04 ..
> drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=20150101
> drwxr-xr-x  2 mapr mapr  1 Jul 30 00:05 id2=__HIVE_DEFAULT_PARTITION__
> drwxr-xr-x  4 mapr mapr  2 Jul 30 00:05 .
> {code}
> 3.Drill will fail:
> {code}
> select * from h1_testpart2;
> Error: SYSTEM ERROR: NumberFormatException: For input string: 
> "__HIVE_DEFAULT_PARTITION__"
> Fragment 0:0
> [Error Id: 509eb392-db9a-42f3-96ea-fb597425f49f on h1.poc.com:31010]
>   (java.lang.reflect.UndeclaredThrowableException) null
>     org.apache.hadoop.security.UserGroupInformation.doAs():1581
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
>     org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745
>   Caused By (org.apache.drill.common.exceptions.ExecutionSetupException) 
> Failure while initializing HiveRecordReader: For input string: 
> "__HIVE_DEFAULT_PARTITION__"
>     org.apache.drill.exec.store.hive.HiveRecordReader.init():241
>     org.apache.drill.exec.store.hive.HiveRecordReader.<init>():138
>     org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
>     org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
>     org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
>     org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1566
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
>     org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745
>   Caused By (java.lang.NumberFormatException) For input string: 
> "__HIVE_DEFAULT_PARTITION__"
>     java.lang.NumberFormatException.forInputString():65
>     java.lang.Integer.parseInt():580
>     java.lang.Integer.parseInt():615
>     
> org.apache.drill.exec.store.hive.HiveRecordReader.convertPartitionType():605
>     org.apache.drill.exec.store.hive.HiveRecordReader.init():236
>     org.apache.drill.exec.store.hive.HiveRecordReader.<init>():138
>     org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():58
>     org.apache.drill.exec.store.hive.HiveScanBatchCreator.getBatch():34
>     org.apache.drill.exec.physical.impl.ImplCreator$2.run():138
>     org.apache.drill.exec.physical.impl.ImplCreator$2.run():136
>     java.security.AccessController.doPrivileged():-2
>     javax.security.auth.Subject.doAs():422
>     org.apache.hadoop.security.UserGroupInformation.doAs():1566
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():136
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRecordBatch():131
>     org.apache.drill.exec.physical.impl.ImplCreator.getChildren():173
>     org.apache.drill.exec.physical.impl.ImplCreator.getRootExec():106
>     org.apache.drill.exec.physical.impl.ImplCreator.getExec():81
>     org.apache.drill.exec.work.fragment.FragmentExecutor.run():235
>     org.apache.drill.common.SelfCleaningRunnable.run():38
>     java.util.concurrent.ThreadPoolExecutor.runWorker():1142
>     java.util.concurrent.ThreadPoolExecutor$Worker.run():617
>     java.lang.Thread.run():745 (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to