[
https://issues.apache.org/jira/browse/HIVE-14154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359927#comment-15359927
]
Gopal V commented on HIVE-14154:
--------------------------------
This is a usability bug - the VectorizedOrcInputFormat is not a complete
general purpose input format.
Possible fix would be to disallow creating tables with that as a the input
format.
Creating tables "STORED AS ORC" should never do that, however the planner will
inject this if the query can be vectorized.
> Select from ORC table stored in VectorizedOrcInputFormat fails
> ---------------------------------------------------------------
>
> Key: HIVE-14154
> URL: https://issues.apache.org/jira/browse/HIVE-14154
> Project: Hive
> Issue Type: Bug
> Components: Hive
> Reporter: Zhu Li
>
> TABLE INFORMATION:
> # col_name data_type comment
>
> c_1m_l bigint
> c_10_l bigint
> c_100_l bigint
> c_1k_l bigint
> c_10k_l bigint
> c_100k_l bigint
> c_1k_r bigint
> c_1m_r bigint
> s_5000_r char(40)
> s_1m_r char(40)
>
> # Detailed Table Information
> Database: default
> Owner: hdfs
> CreateTime: Fri Jul 01 17:32:23 PDT 2016
> LastAccessTime: UNKNOWN
> Protect Mode: None
> Retention: 0
> Location:
> hdfs://mynamenode:8020/apps/hive/warehouse/t_1m_2_orc_vec
> Table Type: MANAGED_TABLE
> Table Parameters:
> COLUMN_STATS_ACCURATE true
> numFiles 8
> numRows 10000000
> rawDataSize 3120000000
> totalSize 105254801
> transient_lastDdlTime 1467419689
> Storage Information
> SerDe Library: org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed: No
> Num Buckets: -1
> Bucket Columns: []
> Sort Columns: []
> Storage Desc Params:
> serialization.format 1
> QUERY: select * from t_1m_2_orc_vec limit 10;
> ERROR:
> Failed with exception java.io.IOException:java.lang.RuntimeException:
> java.lang.RuntimeException: Failed to load plan: null:
> java.lang.NullPointerException
> Additional information:
> input format is set as
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat
> When I want to read from this table with HCatalog API, I got the following
> error:
> Exception in thread "main" java.lang.RuntimeException:
> java.lang.RuntimeException: Failed to load plan: null:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.<init>(VectorizedOrcInputFormat.java:76)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat.getRecordReader(VectorizedOrcInputFormat.java:156)
> at
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.createBaseRecordReader(HCatRecordReader.java:116)
> at
> org.apache.hive.hcatalog.mapreduce.HCatRecordReader.initialize(HCatRecordReader.java:91)
> at OrcConnector.main(OrcConnector.java:254)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:144)
> Caused by: java.lang.RuntimeException: Failed to load plan: null:
> java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:461)
> at
> org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:300)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatchCtx.init(VectorizedRowBatchCtx.java:171)
> at
> org.apache.hadoop.hive.ql.io.orc.VectorizedOrcInputFormat$VectorizedOrcRecordReader.<init>(VectorizedOrcInputFormat.java:74)
> ... 9 more
> Caused by: java.lang.NullPointerException
> at
> org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:416)
> ... 12 more
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)