[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

GitBox Sun, 28 Jun 2020 14:21:02 -0700


afeldman1 commented on a change in pull request #1761:
URL: https://github.com/apache/hudi/pull/1761#discussion_r446699556




##########
File path: docs/_docs/2_3_querying_data.md
##########
@@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring 
Spark ETL pipelines. Hudi
 datasources work (e.g: `spark.read.parquet`). Both snapshot querying and 
incremental querying are supported here. Typically spark jobs require adding 
`--jars <path to jar>/hudi-spark-bundle_2.11-<hudi version>.jar` to classpath 
of drivers 
 and executors. Alternatively, hudi-spark-bundle can also fetched via the 
`--packages` options (e.g: `--packages 
org.apache.hudi:hudi-spark-bundle_2.11:0.5.3`).
 
+### Snapshot query {#spark-snap-query}
+This method can be used to retrieve the data table at the present point in 
time.
+
+```scala
+val hudiIncQueryDF = spark
+     .read()
+     .format("org.apache.hudi")
+     .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), 
DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL())
+     .load(tablePath + "/*") //Include "/*" at the end of the path if the 
table is partitioned

Review comment:
       You are correct, updating. Checking the actual partitioned directories 
seems like something that we should add into the hudi framework itself.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] afeldman1 commented on a change in pull request #1761: [MINOR] Add documentation for using multi-column table keys and for n…

Reply via email to