[
https://issues.apache.org/jira/browse/SPARK-30281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Marcelo Masiero Vanzin reassigned SPARK-30281:
----------------------------------------------
Assignee: Jungtaek Lim
> 'archive' option in FileStreamSource misses to consider partitioned and
> recursive option
> ----------------------------------------------------------------------------------------
>
> Key: SPARK-30281
> URL: https://issues.apache.org/jira/browse/SPARK-30281
> Project: Spark
> Issue Type: Bug
> Components: Structured Streaming
> Affects Versions: 3.0.0
> Reporter: Jungtaek Lim
> Assignee: Jungtaek Lim
> Priority: Major
>
> Cleanup option for FileStreamSource is introduced in SPARK-20568.
> To simplify the condition of verifying archive path, it took the fact that
> FileStreamSource reads the files where these files meet one of conditions: 1)
> parent directory matches the source pattern 2) the file itself matches the
> source pattern.
> We found there're other cases during post-hoc review which invalidate above
> fact: partitioned, and recursive option. With these options, FileStreamSource
> can read the arbitrary files in subdirectories which match the source
> pattern, so simply checking the depth of archive path doesn't work.
> We need to restore the path check logic, though it would be not easy to
> explain to end users.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]