[GitHub] [hudi] matthiasdg edited a comment on issue #3868: [SUPPORT] hive syncing with `--spark-datasource` (first title was: Querying hudi datasets from standalone metastore)

GitBox Mon, 01 Nov 2021 03:32:23 -0700


matthiasdg edited a comment on issue #3868:
URL: https://github.com/apache/hudi/issues/3868#issuecomment-953069335



   I got it working by providing a `--spark-datasource` parameter upon syncing. 
What is quite confusing, is that this actually **disables** 
`syncAsSparkDataSourceTable`, since the default value is true. It basically 
just toggles the default (described here: 
https://github.com/cbeust/jcommander/issues/378). Maybe better to define an 
arity 1 for booleans so that you explicitly have to specify the value when you 
provide the flag (cf. https://jcommander.org/)? 
   
   So if I sync with `syncAsSparkDataSourceTable` I can't query my hive tables 
with spark sql. When can this then be used? Since it's on by default and 
queries like in the examples on the hudi website didn't work like that I feel 
like I missed some documentation.
   
   Other assumption: since the `syncAsSparkDataSourceTable` does not work, I 
still need to provide `spark.sql.hive.convertMetastoreParquet=false`, right? In 
that case I do get https://github.com/apache/hudi/issues/2544 when reading from 
a table where the timestamp option was used. So am using bigint for now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] matthiasdg edited a comment on issue #3868: [SUPPORT] hive syncing with `--spark-datasource` (first title was: Querying hudi datasets from standalone metastore)

Reply via email to