afeldman1 commented on a change in pull request #1761: URL: https://github.com/apache/hudi/pull/1761#discussion_r446703736
########## File path: docs/_docs/2_3_querying_data.md ########## @@ -136,6 +136,16 @@ The Spark Datasource API is a popular way of authoring Spark ETL pipelines. Hudi datasources work (e.g: `spark.read.parquet`). Both snapshot querying and incremental querying are supported here. Typically spark jobs require adding `--jars <path to jar>/hudi-spark-bundle_2.11-<hudi version>.jar` to classpath of drivers and executors. Alternatively, hudi-spark-bundle can also fetched via the `--packages` options (e.g: `--packages org.apache.hudi:hudi-spark-bundle_2.11:0.5.3`). +### Snapshot query {#spark-snap-query} +This method can be used to retrieve the data table at the present point in time. + +```scala +val hudiIncQueryDF = spark + .read() + .format("org.apache.hudi") + .option(DataSourceReadOptions.QUERY_TYPE_OPT_KEY(), DataSourceReadOptions.QUERY_TYPE_SNAPSHOT_OPT_VAL()) + .load(tablePath + "/*") //Include "/*" at the end of the path if the table is partitioned Review comment: Well, it would be one fewer, no? For each partition level so `tablePath + "/*/*/*"` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org