[
https://issues.apache.org/jira/browse/SPARK-5221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jem Tucker updated SPARK-5221:
------------------------------
Priority: Major (was: Minor)
> FileInputDStream "remember window" in certain situations causes files to be
> ignored
> ------------------------------------------------------------------------------------
>
> Key: SPARK-5221
> URL: https://issues.apache.org/jira/browse/SPARK-5221
> Project: Spark
> Issue Type: Bug
> Components: Streaming
> Affects Versions: 1.1.1, 1.2.0
> Reporter: Jem Tucker
>
> When batch times are greater than 1 minute, if a file begins to be moved into
> a directory just before FileInputDStream.findNewFiles() is called but does
> not become visible untill after it has excecuted and therefore is not
> included in that batch, the file is then ignored in the following batch as
> its mod time is less than the modTimeIgnoreThreshold. This causes data to be
> ignored in spark streaming that shouldnt be, especially when large files are
> being moved into the directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]