[
https://issues.apache.org/jira/browse/HIVE-22438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wenning Ding reassigned HIVE-22438:
-----------------------------------
> Additional comma is added to projection column ids
> --------------------------------------------------
>
> Key: HIVE-22438
> URL: https://issues.apache.org/jira/browse/HIVE-22438
> Project: Hive
> Issue Type: Bug
> Reporter: Wenning Ding
> Assignee: Wenning Ding
> Priority: Major
>
> I ran into this issue when querying a Hudi data through Hive.
> Basically, to query a Hudi style table, Hudi implements its own InputFormat
> class and overwrite the getRecordReader method. In this method, because of
> some reasons, Hudi will manually add several projection column ids and
> projection column names when each time getRecordReader method is called. Like
> this:
>
> {code:java}
> public RecordReader<NullWritable, ArrayWritable> getRecordReader(final
> InputSplit split, final JobConf job,
> final Reporter reporter) throws IOException {
> if
> (!job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR).contains("col_a"))
> {
> job.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "col_a");
> }
> if
> (!job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR).contains("1")) {
> job.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, "1");
> }
> super.getRecordReader(split, job, reporter);
> }
> {code}
>
> In this situation, it will cause a problem when using COUNT(*) or COUNT(1)
> query. Note that for COUNT(*) or COUNT(1), Hive don't need to read any
> column. So the projection column ids is an empty string.
> Here is a log example to show the whole workflow.
> {code:java}
> [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader
> for index 0 of 2
> [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding
> Hoodie columns, Projections : Ids :
> [INFO] [TezChild] |hadoop.HoodieParquetInputFormat|: After adding Hoodie
> columns, Projections :col_a Ids :1
> [DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader
> for index 1 of 2
> [INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding
> Hoodie columns, Projections :col_a Ids :,1
> {code}
> As we can see, at the second time, projection ids becomes ",1" and that
> additional comma will cause exceptions in the following program.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)