[
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Vinoth Chandar updated HUDI-151:
--------------------------------
Component/s: Hive Integration
> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
> Key: HUDI-151
> URL: https://issues.apache.org/jira/browse/HUDI-151
> Project: Apache Hudi (incubating)
> Issue Type: Task
> Components: Hive Integration, Realtime View
> Reporter: Nishith Agarwal
> Assignee: Nishith Agarwal
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding
> the same column ids multiple times which ultimately breaks the query. We need
> to find why RO view works fine but RT doesn't.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)