Wenning Ding created HIVE-22438:
-----------------------------------
Summary: Additional comma is added to projection column ids
Key: HIVE-22438
URL: https://issues.apache.org/jira/browse/HIVE-22438
Project: Hive
Issue Type: Bug
Reporter: Wenning Ding
Assignee: Wenning Ding
I ran into this issue when querying a Hudi data through Hive.
Basically, to query a Hudi style table, Hudi implements its own InputFormat
class and overwrite the getRecordReader method. In this method, because of some
reasons, Hudi will manually add several projection column ids and projection
column names when each time getRecordReader method is called. Like this:
{code:java}
public RecordReader<NullWritable, ArrayWritable> getRecordReader(final
InputSplit split, final JobConf job,
final Reporter reporter) throws IOException {
if
(!job.get(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR).contains("col_a")) {
job.set(ColumnProjectionUtils.READ_COLUMN_NAMES_CONF_STR, "col_a");
}
if (!job.get(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR).contains("1"))
{
job.set(ColumnProjectionUtils.READ_COLUMN_IDS_CONF_STR, "1");
}
super.getRecordReader(split, job, reporter);
}
{code}
In this situation, it will cause a problem when using COUNT(*) or COUNT(1)
query. Note that for COUNT(*) or COUNT(1), Hive don't need to read any column.
So the projection column ids is an empty string.
Here is a log example to show the whole workflow.
{code:java}
[DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader for
index 0 of 2
[INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding
Hoodie columns, Projections : Ids :
[INFO] [TezChild] |hadoop.HoodieParquetInputFormat|: After adding Hoodie
columns, Projections :col_a Ids :1
[DEBUG] [TezChild] |split.TezGroupedSplitsInputFormat|: Init record reader for
index 1 of 2
[INFO] [TezChild] |realtime.HoodieParquetRealtimeInputFormat|: Before adding
Hoodie columns, Projections :col_a Ids :,1
{code}
As we can see, at the second time, projection ids becomes ",1" and that
additional comma will cause exceptions in the following program.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)