[ 
https://issues.apache.org/jira/browse/HUDI-2682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Charlie Briggs updated HUDI-2682:
---------------------------------
    Description: 
When syncing hive schema, new columns added from the source dataset are not 
propagated to the `spark.sql.sources.schema` metadata on the hive table. This 
leads to columns not being available when querying the dataset via spark SQL.

Tested with both spark data writer and deltastreamer).

The column we observed this on was a struct column, but it seems like it would 
be independent of datatype.

  was:
When syncing hive schema, new columns added from the source dataset are not 
propagated to the `spark.sql.sources.schema` metadata. This leads to columns 
not being available when querying the dataset via spark SQL.

Tested with both spark data writer and deltastreamer). 

The column we observed this on was a struct column, but it seems like it would 
be independent.


> Spark schema not updated with new columns on hive sync
> ------------------------------------------------------
>
>                 Key: HUDI-2682
>                 URL: https://issues.apache.org/jira/browse/HUDI-2682
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: Hive Integration
>    Affects Versions: 0.9.0
>            Reporter: Charlie Briggs
>            Priority: Minor
>
> When syncing hive schema, new columns added from the source dataset are not 
> propagated to the `spark.sql.sources.schema` metadata on the hive table. This 
> leads to columns not being available when querying the dataset via spark SQL.
> Tested with both spark data writer and deltastreamer).
> The column we observed this on was a struct column, but it seems like it 
> would be independent of datatype.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to