[GitHub] [hudi] matthiasdg commented on issue #8713: [SUPPORT] Incremental queries with Spark through Hive metastore

via GitHub Wed, 17 May 2023 06:52:32 -0700


matthiasdg commented on issue #8713:
URL: https://github.com/apache/hudi/issues/8713#issuecomment-1551439622


   @ad1happy2go I have no issue with the incremental query cf. the page you 
linked. However, that example requires passing in the root path of the table. I 
was wondering if there is an option like:
   `spark.read.option(...).table("name_of_table_in_hive")` -> that does not 
seem to work. Main reason I'd want this is that I then don't have to care about 
the path of the table. I can of course get the path from the hive table using 
`org.apache.spark.sql.catalyst.catalog.SessionCatalog.getTableMetadata` and 
then pass that in, but was wondering if there is a more direct way.
   
   I understand it's about getting data above a certain instant time, but would 
have thought this is already fulfilled if you load the table with an 
incremental query that has a start instant of `${min.commit.time}`. Are there 
cases where this additional filtering is necessary?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] matthiasdg commented on issue #8713: [SUPPORT] Incremental queries with Spark through Hive metastore

Reply via email to