[jira] [Updated] (HIVE-17653) Druid storage handler CTAS with boolean type columns fails.

slim bouguerra (JIRA) Fri, 29 Sep 2017 08:17:46 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-17653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


slim bouguerra updated HIVE-17653:
----------------------------------
    Attachment: HIVE-17653.patch

This patch adds support for boolean columns indexing.
The only downside the actual column will be indexed as dimension string in 
druid while hive metastore will still consider it as boolean.
This doesn't lead to any dysfunction, but the underline column will be treated 
as a metric (thus no filter push down).
(I think) This annoying behavior can be fixed from the druid-calcite rule-based 
optimizer.

> Druid storage handler CTAS with boolean type columns fails. 
> ------------------------------------------------------------
>
>                 Key: HIVE-17653
>                 URL: https://issues.apache.org/jira/browse/HIVE-17653
>             Project: Hive
>          Issue Type: Bug
>          Components: Druid integration
>            Reporter: slim bouguerra
>            Assignee: Ashutosh Chauhan
>             Fix For: 3.0.0
>
>         Attachments: HIVE-17653.patch
>
>
> Druid storage handler CTAS fails with the exception below when a Boolean 
> column is included.
> A simple workaround would be to add a cast to string over the boolean column, 
> this will lead to index the column as a druid dimension with value `true` or 
> `false`.
> {code}
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Reducer 3, 
> vertexId=vertex_1506230948023_0005_9_02, diagnostics=[Task failed, 
> taskId=task_1506230948023_0005_9_02_000003, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1506230948023_0005_9_02_000003_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> Hive Runtime Error while processing vector batch (tag=0) (vectorizedVertexNum 
> 2)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
>       at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:422)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>       at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>       at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>       at 
> org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
>       at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing vector batch (tag=0) (vectorizedVertexNum 2)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:406)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:248)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:319)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:189)
>       ... 15 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing vector batch (tag=0) (vectorizedVertexNum 2)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:492)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecordVector(ReduceRecordSource.java:397)
>       ... 18 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Dimension bo does not have STRING type: BOOLEAN
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:564)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:664)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:101)
>       at 
> org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:955)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:903)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:145)
>       at 
> org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.processVectorGroup(ReduceRecordSource.java:479)
>       ... 19 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Dimension bo does not have STRING type: BOOLEAN
>       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:272)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:609)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:553)
>       ... 25 more
> Caused by: java.io.IOException: Dimension bo does not have STRING type: 
> BOOLEAN
>       at 
> org.apache.hadoop.hive.druid.io.DruidOutputFormat.getHiveRecordWriter(DruidOutputFormat.java:158)
>       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:284)
>       at 
> org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:269)
>       ... 27 more
> ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : 
> attempt_1506230948023_0005_9_02_000003_1:java.lang.RuntimeException: 
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Updated] (HIVE-17653) Druid storage handler CTAS with boolean type columns fails.

Reply via email to