Jungtaek Lim created SPARK-30281:
------------------------------------
Summary: 'archive' option in FileStreamSource misses to consider
partitioned and recursive option
Key: SPARK-30281
URL: https://issues.apache.org/jira/browse/SPARK-30281
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.0.0
Reporter: Jungtaek Lim
Cleanup option for FileStreamSource is introduced in SPARK-20568.
To simplify the condition of verifying archive path, it took the fact that
FileStreamSource reads the files where these files meet one of conditions: 1)
parent directory matches the source pattern 2) the file itself matches the
source pattern.
We found there're other cases during post-hoc review which invalidate above
fact: partitioned, and recursive option. With these options, FileStreamSource
can read the arbitrary files in subdirectories which match the source pattern,
so simply checking the depth of archive path doesn't work.
We need to restore the path check logic, though it would be not easy to explain
to end users.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]