[ 
https://issues.apache.org/jira/browse/SPARK-31371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084223#comment-17084223
 ] 

Gabor Somogyi commented on SPARK-31371:
---------------------------------------

There was a similar feature request before and the conclusion was not to add 
it. As an extract when the file created it must be done atomically and then the 
content mustn't change. Didn't really found the jira or PR, maybe closed. AFIAK 
[~zsxwing] was there...

> FileStreamSource: Decide seen files on the checksum, instead of filename.
> -------------------------------------------------------------------------
>
>                 Key: SPARK-31371
>                 URL: https://issues.apache.org/jira/browse/SPARK-31371
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 2.4.5, 3.0.0
>            Reporter: Prashant Sharma
>            Priority: Major
>
> At the moment structured streaming's file source, ignores updates to the same 
> file, it has processed earlier. However, for reasons beyond our control, a 
> software might update the same file with new data. A case in point can be 
> rolling logs, where the latest log file is always e.g. log.txt and the rolled 
> logs could be log-1.txt etc... 
> So by supporting this, it may not actually be a special casing but supporting 
> a genuine use case. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to