[jira] [Created] (SPARK-49051) Provide modifiedAfter and modifiedBefore options when filtering from a stream source

Jeff Steinmetz (Jira) Mon, 29 Jul 2024 14:44:04 -0700

Jeff Steinmetz created SPARK-49051:
--------------------------------------

             Summary: Provide modifiedAfter and modifiedBefore options when 
filtering from a stream source
                 Key: SPARK-49051
                 URL: https://issues.apache.org/jira/browse/SPARK-49051
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.4.3
            Reporter: Jeff Steinmetz



In the following Jira issue
https://issues.apache.org/jira/browse/SPARK-31962



Two new options, *modifiiedBefore* and *modifiedAfter* for batch reads (for 
example, CSV) where introduced, and eventually merged into version 3.1.1 via PR:

https://issues.apache.org/jira/browse/SPARK-31962

 

This was introduced in a way that batch reads allow these two options, however 
a stream is explicitly not allowed.



When loading files from a data source as a stream, there too can be times where 
thousands of files are within a respective file path. This applies to both 
batch and stream use cases.  Note:  The Databricks "cloudFiles" AutoLoader 
supports these options in a stream.  

[https://docs.databricks.com/en/ingestion/auto-loader/options.html#id20]

 

{{*Suggested Example Usages*}}
{{_Start stream with all CSV files modified after date:_}}
{{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
 '"').option("escape", '"').csv(source_path)}}

{{_Start Stream with all CSV files modified before date:_}}
{{spark.readStream.option("modifiedAfter","2020-06-15T05:00:00").option("quote",
 '"').option("escape", '"').csv(source_path)}}

_Start stream with all CSV files modified between two dates:_

{{spark.readStream.option("modifiedAfter","2019-06-15T05:00:00").{{{}option("modifiedBefore","2020-06-15T05:00:00"){}}}option("quote",
 '"').option("escape", '"').csv(source_path)}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Created] (SPARK-49051) Provide modifiedAfter and modifiedBefore options when filtering from a stream source

Reply via email to