[ 
https://issues.apache.org/jira/browse/HUDI-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397803#comment-17397803
 ] 

Vinoth Chandar commented on HUDI-1363:
--------------------------------------

Thinking about this more. I think we should just invest in a Hudi table 
definition file for big query. 

[https://cloud.google.com/bigquery/external-table-definition#creating_a_table_definition_for_hive-partitioned_data]
 

We could attempt to fix the dropping of the partitioning columns to make the bq 
error go away. but in reality, there would be other problems you'd encounter if 
you just treat this as a parquet table. specifically, as you do these 
background deletes, partial data from the hudi transactions would be exposed to 
queries which can potentially fail the query or expose partial files (for e,g 
with cow, there will be different versions of the same parquet file internally 
maintained, on hive, spark, presto, trino etc, Hudi integrates on the query 
path to ensure we only see the latest committed snapshot of the table). Do you 
think a first class bq sync mechanism is a better approach? or you are happy 
living with these other issues in the short term (in which case, we can 
prioritize this issue by itself)

> Provide Option to drop columns after they are used to generate partition or 
> record keys
> ---------------------------------------------------------------------------------------
>
>                 Key: HUDI-1363
>                 URL: https://issues.apache.org/jira/browse/HUDI-1363
>             Project: Apache Hudi
>          Issue Type: New Feature
>          Components: Writer Core
>            Reporter: Balaji Varadarajan
>            Assignee: liwei
>            Priority: Blocker
>             Fix For: 0.9.0
>
>
> Context: https://github.com/apache/hudi/issues/2213



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to