[GitHub] [hudi] pengzhiwei2018 commented on pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

GitBox Mon, 08 Mar 2021 18:25:30 -0800


pengzhiwei2018 commented on pull request #2283:
URL: https://github.com/apache/hudi/pull/2283#issuecomment-793280079



   > @pengzhiwei2018 @nsivabalan @vinothchandar
   > 1、Now when use spark sql query hudi, need set 
"spark.sql.hive.convertMetastoreParquet=false" 
https://hudi.apache.org/docs/querying_data.html. But it is confused, many user 
maybe forget it .
   > 2、If spark read the table use datasource have a big advantage:
   > hive meta will be very light, the partition list and schema do not need to 
visit hive meta. How does databricks delta do ?
   > 3、I suggest to persist the properties to hudi metatable. Hive meta just 
persist the table name and database name . May be we can research delta lake.
   
   Agree with 1 and 2. Currently spark read delta as datasource table too.
   For question 3, Currently spark need this properties when read meta data 
from the meta store, So we should persist these properties there.  If just 
persist these properties to the hudi metatable, there need same change to the 
spark code to support this, I think.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] pengzhiwei2018 commented on pull request #2283: [HUDI-1415] Read Hoodie Table As Spark DataSource Table

Reply via email to