[ 
https://issues.apache.org/jira/browse/FLINK-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek closed FLINK-6993.
-----------------------------------
       Resolution: Not A Problem
    Fix Version/s:     (was: 1.3.2)

{{FileInputFormat}} has a method {{acceptFile(FileStatus)}} that is meant to be 
overridden for custom file skipping logic. (Per default, this skips files 
starting with "_" and ".", as does Hadoop.) {{BucketingSink}} can be configured 
to change the prefixes (and suffixes) of the files it writes, so a workaround 
is to just change the file prefix.

Please re-open if you have a concrete plan for changing this, otherwise I would 
say it's working as intended.

> Not reading recursive files in Batch by using readTextFile when file name 
> contains _ in starting.
> -------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-6993
>                 URL: https://issues.apache.org/jira/browse/FLINK-6993
>             Project: Flink
>          Issue Type: Bug
>          Components: Batch Connectors and Input/Output Formats
>    Affects Versions: 1.3.0
>            Reporter: Shashank Agarwal
>            Priority: Critical
>
> When i try to read files from a folder using using readTextFile in batch and 
> using recursive.file.enumeration, It's not reading the files when file name 
> contains _ in starting. But when i removed the _ from start it's working 
> fine. 
> It also working fine in case of direct path of single file not working with 
> Directory path. For replicate the issue :
> {code}
> import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
> import org.apache.flink.configuration.Configuration
> object CSVMerge {
>   def main(args: Array[String]): Unit = {
>     val env = ExecutionEnvironment.getExecutionEnvironment
>     // create a configuration object
>     val parameters = new Configuration
>     // set the recursive enumeration parameter
>     parameters.setBoolean("recursive.file.enumeration", true)
>     val stream = env.readTextFile("file:///Users/data")
>       .withParameters(parameters)
>     stream.print()
>   }
> }
> {code}
> When you put 2-3 Text files with name like 1.txt, 2.txt etc. in data folder 
> it's working fine. But when we put _1.txt, _2.txt file it's not working.
> Flink BucketingSink in stream by default put _ before the file names. So 
> unable to read Sinked files from DataStream.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to