[jira] [Commented] (HUDI-151) Fix Realtime queries on Hive on Spark engine

sivabalan narayanan (Jira) Wed, 27 Jan 2021 10:46:04 -0800


    [ 
https://issues.apache.org/jira/browse/HUDI-151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17273083#comment-17273083
 ]


sivabalan narayanan commented on HUDI-151:
------------------------------------------

[~nishith29]: can we close this ticket? 

> Fix Realtime queries on Hive on Spark engine
> --------------------------------------------
>
>                 Key: HUDI-151
>                 URL: https://issues.apache.org/jira/browse/HUDI-151
>             Project: Apache Hudi
>          Issue Type: Task
>          Components: Hive Integration
>            Reporter: Nishith Agarwal
>            Assignee: Nishith Agarwal
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> ColumnId projections work differently across HoodieInputFormat and 
> HoodieRealtimeInputFormat
> We track the read column ids and names to be used throughout the execution 
> and lifetime of a mapper task needed for Hive on Spark. Our theory is that 
> due to \{@link org.apache.hadoop.hive.ql.io.parquet.ProjectionPusher} not 
> handling empty list correctly, the ParquetRecordReaderWrapper ends up adding 
> the same column ids multiple times which ultimately breaks the query. We need 
> to find why RO view works fine but RT doesn't.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-151) Fix Realtime queries on Hive on Spark engine

Reply via email to