[
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273083#comment-17273083
]
sivabalan narayanan commented on HUDI-151:
------------------------------------------
[~nishith29]: can we close this ticket?
> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
> Key: HUDI-151
> URL: https://issues.apache.org/jira/browse/HUDI-151
> Project: Apache Hudi
> Issue Type: Task
> Components: Hive Integration
> Reporter: Nishith Agarwal
> Assignee: Nishith Agarwal
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding
> the same column ids multiple times which ultimately breaks the query. We need
> to find why RO view works fine but RT doesn't.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)