Prashant Sharma created SPARK-31371:
---------------------------------------
Summary: FileStreamSource: Decide seen files on the checksum,
instead of filename.
Key: SPARK-31371
URL: https://issues.apache.org/jira/browse/SPARK-31371
Project: Spark
Issue Type: Improvement
Components: Structured Streaming
Affects Versions: 2.4.5, 3.0.0
Reporter: Prashant Sharma
At the moment structured streaming's file source, ignores updates to the same
file, it has processed earlier. However, for reasons beyond our control, a
software might update the same file with new data. A case in point can be
rolling logs, where the latest log file is always e.g. log.txt and the rolled
logs could be log-1.txt etc...
So by supporting this, it may not actually be a special casing but supporting a
genuine use case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]