vinothchandar commented on a change in pull request #674: Upgrade to Hive 2.x,
MOR read query fixes and performance improvement
URL: https://github.com/apache/incubator-hudi/pull/674#discussion_r291869486
##########
File path:
hoodie-hadoop-mr/src/main/java/com/uber/hoodie/hadoop/realtime/HoodieRealtimeInputFormat.java
##########
@@ -67,6 +68,15 @@
public static final int HOODIE_COMMIT_TIME_COL_POS = 0;
public static final int HOODIE_RECORD_KEY_COL_POS = 2;
public static final int HOODIE_PARTITION_PATH_COL_POS = 3;
+ // Track the read column ids and names to be used throughout the execution
and lifetime of this task
+ // Needed for Hive on Spark. Our theory is that due to
+ // {@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher}
+ // not handling empty list correctly, the ParquetRecordReaderWrapper ends up
adding the same column ids multiple
+ // times which ultimately breaks the query.
+ // TODO : Find why RO view works fine but RT doesn't
Review comment:
yikes. file a tracking jira?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services