Jungtaek Lim created SPARK-30281:
------------------------------------

             Summary: 'archive' option in FileStreamSource misses to consider 
partitioned and recursive option
                 Key: SPARK-30281
                 URL: https://issues.apache.org/jira/browse/SPARK-30281
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.0.0
            Reporter: Jungtaek Lim


Cleanup option for FileStreamSource is introduced in SPARK-20568.

To simplify the condition of verifying archive path, it took the fact that 
FileStreamSource reads the files where these files meet one of conditions: 1) 
parent directory matches the source pattern 2) the file itself matches the 
source pattern.

We found there're other cases during post-hoc review which invalidate above 
fact: partitioned, and recursive option. With these options, FileStreamSource 
can read the arbitrary files in subdirectories which match the source pattern, 
so simply checking the depth of archive path doesn't work.

We need to restore the path check logic, though it would be not easy to explain 
to end users.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to