[
https://issues.apache.org/jira/browse/SPARK-49051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeff Steinmetz updated SPARK-49051:
-----------------------------------
Affects Version/s: 3.5.1
(was: 3.4.3)
> Provide modifiedAfter and modifiedBefore options when filtering from a stream
> source
> ------------------------------------------------------------------------------------
>
> Key: SPARK-49051
> URL: https://issues.apache.org/jira/browse/SPARK-49051
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.5.1
> Reporter: Jeff Steinmetz
> Priority: Minor
>
> In the following Jira issue
> https://issues.apache.org/jira/browse/SPARK-31962
> Two new options, *modifiiedBefore* and *modifiedAfter* for batch reads (for
> example, CSV) where introduced, and eventually merged into version 3.1.1 via
> PR:
> https://issues.apache.org/jira/browse/SPARK-31962
>
> This was introduced in a way that batch reads allow these two options,
> however a stream is explicitly not allowed.
> When loading files from a data source as a stream, there too can be times
> where thousands of files are within a respective file path. This applies to
> both batch and stream use cases. Note: The Databricks "cloudFiles"
> AutoLoader supports these options in a stream.
> [https://docs.databricks.com/en/ingestion/auto-loader/options.html#id20]
>
> {{*Suggested Example Usages*}}
> {{_Start stream with all CSV files modified after date:_}}
> {{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
> '"').option("escape", '"').csv(source_path)}}
> {{_Start Stream with all CSV files modified before date:_}}
> {{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
> '"').option("escape", '"').csv(source_path)}}
> _Start stream with all CSV files modified between two dates:_
> {{spark.readStream.option("modifiedAfter","2019-06-15T05:00:00").{{{}option("modifiedBefore","2020-06-15T05:00:00"){}}}option("quote",
> '"').option("escape", '"').csv(source_path)}}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]