[
https://issues.apache.org/jira/browse/SPARK-8605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15760867#comment-15760867
]
Adam Wang edited comment on SPARK-8605 at 12/19/16 11:01 AM:
-------------------------------------------------------------
How about change FileInputDStream.defaultFilter(path) to fix this bug? Could
we filter file such as "*.tmp *.COPYING" etc?
was (Author: adam wang):
How about change FileInputDStream.defaultFilter(path) to fix this bug? We
could filter file which name like "*.tmp *.COPYING" etc?
> Exclude files in StreamingContext. textFileStream(directory)
> ------------------------------------------------------------
>
> Key: SPARK-8605
> URL: https://issues.apache.org/jira/browse/SPARK-8605
> Project: Spark
> Issue Type: Improvement
> Components: DStreams
> Reporter: Noel Vo
> Labels: streaming, streaming_api
>
> Currenly, spark streaming can monitor a directory and it will process the
> newly added files. This will cause a bug if the files copied to the directory
> are big. For example, in hdfs, if a file is being copied, its name is
> file_name._COPYING_. Spark will pick up the file and process. However, when
> it's done copying the file, the file name becomes file_name. This would cause
> FileDoesNotExist error. It would be great if we can exclude files using regex
> in the directory.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]