[ https://issues.apache.org/jira/browse/HIVE-9235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14276620#comment-14276620 ]
Matt McCline commented on HIVE-9235: ------------------------------------ First issue (vectorization of Parquet): Missing cases in VectorColumnAssignFactory.java's public static VectorColumnAssign[] buildAssigners(VectorizedRowBatch outputBatch, Writable[] writables) for HiveCharWritable, HiveVarcharWritable, DateWritable, HiveDecimalWriter. Example of exception caused: {noformat} Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:136) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:49) at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:347) ... 21 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unimplemented vector assigner for writable type class org.apache.hadoop.hive.serde2.io.HiveDecimalWritable at org.apache.hadoop.hive.ql.exec.vector.VectorColumnAssignFactory.buildAssigners(VectorColumnAssignFactory.java:528) at org.apache.hadoop.hive.ql.io.parquet.VectorizedParquetInputFormat$VectorizedParquetRecordReader.next(VectorizedParquetInputFormat.java:127) ... 23 more {noformat} Added code to fix that. Then, I copied a half dozen q vectorized tests using ORC tables and tried converted them to use PARQUET, but encountered another issue in *non-vectorized* mode. I was trying to establish base query outputs that I could use to verify the vectorized query output. This indicated a basic non-vectorized case of CHAR data type usage wasn't working for PARQUET. {noformat} Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:814) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:493) ... 10 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.io.HiveCharWritable at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableHiveCharObjectInspector.copyObject(WritableHiveCharObjectInspector.java:104) at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.copyToStandardObject(ObjectInspectorUtils.java:305) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:150) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.deepCopyElements(KeyWrapperFactory.java:142) at org.apache.hadoop.hive.ql.exec.KeyWrapperFactory$ListKeyWrapper.copyKey(KeyWrapperFactory.java:119) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processHashAggr(GroupByOperator.java:827) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:739) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processOp(GroupByOperator.java:809) ... 16 more {noformat} I filed this problem under HIVE-9371: Execution error for Parquet table and GROUP BY involving CHAR data type At that point we concluded we should temporarily disable vectorization of PARQUET since there is only one test that doesn't provide complete coverage of data types. FYI: [~hagleitn] > Turn off Parquet Vectorization until all data types work: DECIMAL, DATE, > TIMESTAMP, CHAR, and VARCHAR > ----------------------------------------------------------------------------------------------------- > > Key: HIVE-9235 > URL: https://issues.apache.org/jira/browse/HIVE-9235 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Matt McCline > Assignee: Matt McCline > Priority: Critical > Attachments: HIVE-9235.01.patch > > > Title was: Make Parquet Vectorization of these data types work: DECIMAL, > DATE, TIMESTAMP, CHAR, and VARCHAR > Support for doing vector column assign is missing for some data types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)