[jira] [Updated] (HUDI-151) Fix Realtime queries on Hive on Spark engine

Vinoth Chandar (Jira) Tue, 24 Dec 2019 13:48:31 -0800


     [ 
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Vinoth Chandar updated HUDI-151:
--------------------------------
    Component/s: Hive Integration

> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
>                 Key: HUDI-151
>                 URL: https://issues.apache.org/jira/browse/HUDI-151
>             Project: Apache Hudi (incubating)
>          Issue Type: Task
>          Components: Hive Integration, Realtime View
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and 
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution 
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that 
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not 
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding 
> the same column ids multiple times which ultimately breaks the query. We need 
> to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (HUDI-151) Fix Realtime queries on Hive on Spark engine

Reply via email to