[ 
https://issues.apache.org/jira/browse/SPARK-49051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Steinmetz updated SPARK-49051:
-----------------------------------
    Affects Version/s: 3.5.1
                           (was: 3.4.3)

> Provide modifiedAfter and modifiedBefore options when filtering from a stream 
> source
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-49051
>                 URL: https://issues.apache.org/jira/browse/SPARK-49051
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.1
>            Reporter: Jeff Steinmetz
>            Priority: Minor
>
> In the following Jira issue
> https://issues.apache.org/jira/browse/SPARK-31962
> Two new options, *modifiiedBefore* and *modifiedAfter* for batch reads (for 
> example, CSV) where introduced, and eventually merged into version 3.1.1 via 
> PR:
> https://issues.apache.org/jira/browse/SPARK-31962
>  
> This was introduced in a way that batch reads allow these two options, 
> however a stream is explicitly not allowed.
> When loading files from a data source as a stream, there too can be times 
> where thousands of files are within a respective file path. This applies to 
> both batch and stream use cases.  Note:  The Databricks "cloudFiles" 
> AutoLoader supports these options in a stream.  
> [https://docs.databricks.com/en/ingestion/auto-loader/options.html#id20]
>  
> {{*Suggested Example Usages*}}
> {{_Start stream with all CSV files modified after date:_}}
> {{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
>  '"').option("escape", '"').csv(source_path)}}
> {{_Start Stream with all CSV files modified before date:_}}
> {{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
>  '"').option("escape", '"').csv(source_path)}}
> _Start stream with all CSV files modified between two dates:_
> {{spark.readStream.option("modifiedAfter","2019-06-15T05:00:00").{{{}option("modifiedBefore","2020-06-15T05:00:00"){}}}option("quote",
>  '"').option("escape", '"').csv(source_path)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to