[
https://issues.apache.org/jira/browse/HUDI-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17397803#comment-17397803
]
Vinoth Chandar commented on HUDI-1363:
--------------------------------------
Thinking about this more. I think we should just invest in a Hudi table
definition file for big query.
[https://cloud.google.com/bigquery/external-table-definition#creating_a_table_definition_for_hive-partitioned_data]
We could attempt to fix the dropping of the partitioning columns to make the bq
error go away. but in reality, there would be other problems you'd encounter if
you just treat this as a parquet table. specifically, as you do these
background deletes, partial data from the hudi transactions would be exposed to
queries which can potentially fail the query or expose partial files (for e,g
with cow, there will be different versions of the same parquet file internally
maintained, on hive, spark, presto, trino etc, Hudi integrates on the query
path to ensure we only see the latest committed snapshot of the table). Do you
think a first class bq sync mechanism is a better approach? or you are happy
living with these other issues in the short term (in which case, we can
prioritize this issue by itself)
> Provide Option to drop columns after they are used to generate partition or
> record keys
> ---------------------------------------------------------------------------------------
>
> Key: HUDI-1363
> URL: https://issues.apache.org/jira/browse/HUDI-1363
> Project: Apache Hudi
> Issue Type: New Feature
> Components: Writer Core
> Reporter: Balaji Varadarajan
> Assignee: liwei
> Priority: Blocker
> Fix For: 0.9.0
>
>
> Context: https://github.com/apache/hudi/issues/2213
--
This message was sent by Atlassian Jira
(v8.3.4#803005)