[GitHub] [hudi] matthiasdg opened a new issue, #8713: [SUPPORT] Incremental queries with Spark through Hive metastore

via GitHub Mon, 15 May 2023 02:46:21 -0700


matthiasdg opened a new issue, #8713:
URL: https://github.com/apache/hudi/issues/8713


   **Describe the problem you faced**
   
   Just wondering whether it's possible to do incremental queries using Spark 
with a Hudi table that was synced to Hive metastore using the HiveSyncTool; 
i.e., something like`sparkSession.table(tableName).select("*").where(...)` .
   Did not find any documentation or FAQ about this. 
   
   Encountered things like in `docker/demo/hive-incremental-cow.commands`: `set 
hoodie.stock_ticks_cow.consume.mode=INCREMENTAL;`, but not sure if that also 
applies here; is this only for going through HiveServer? (We're not running 
HiveServer, only metastore). Can I somehow pass these as config parameters to 
the Spark session (did some experiments, also in Spark SQL, but did not seem to 
be working).
   
   Also, why is there a `'_hoodie_commit_time' > '${min.commit.time}'` in the 
example query in the file I mentioned? Shouldn't that already be the case 
because the query is incremental?
   
   
   **Environment Description**
   
   * Hudi version : 0.10
   
   * Spark version : 3.1.2
   
   * Hive version : HMS 3.0
   
   * Hadoop version : 3.2
   
   * Storage (HDFS/S3/GCS..) : ADLS Gen 2
   
   * Running on Docker? (yes/no) : Kubernetes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] matthiasdg opened a new issue, #8713: [SUPPORT] Incremental queries with Spark through Hive metastore

Reply via email to