[
https://issues.apache.org/jira/browse/FLINK-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aljoscha Krettek closed FLINK-6993.
-----------------------------------
Resolution: Not A Problem
Fix Version/s: (was: 1.3.2)
{{FileInputFormat}} has a method {{acceptFile(FileStatus)}} that is meant to be
overridden for custom file skipping logic. (Per default, this skips files
starting with "_" and ".", as does Hadoop.) {{BucketingSink}} can be configured
to change the prefixes (and suffixes) of the files it writes, so a workaround
is to just change the file prefix.
Please re-open if you have a concrete plan for changing this, otherwise I would
say it's working as intended.
> Not reading recursive files in Batch by using readTextFile when file name
> contains _ in starting.
> -------------------------------------------------------------------------------------------------
>
> Key: FLINK-6993
> URL: https://issues.apache.org/jira/browse/FLINK-6993
> Project: Flink
> Issue Type: Bug
> Components: Batch Connectors and Input/Output Formats
> Affects Versions: 1.3.0
> Reporter: Shashank Agarwal
> Priority: Critical
>
> When i try to read files from a folder using using readTextFile in batch and
> using recursive.file.enumeration, It's not reading the files when file name
> contains _ in starting. But when i removed the _ from start it's working
> fine.
> It also working fine in case of direct path of single file not working with
> Directory path. For replicate the issue :
> {code}
> import org.apache.flink.api.scala.{DataSet, ExecutionEnvironment}
> import org.apache.flink.configuration.Configuration
> object CSVMerge {
> def main(args: Array[String]): Unit = {
> val env = ExecutionEnvironment.getExecutionEnvironment
> // create a configuration object
> val parameters = new Configuration
> // set the recursive enumeration parameter
> parameters.setBoolean("recursive.file.enumeration", true)
> val stream = env.readTextFile("file:///Users/data")
> .withParameters(parameters)
> stream.print()
> }
> }
> {code}
> When you put 2-3 Text files with name like 1.txt, 2.txt etc. in data folder
> it's working fine. But when we put _1.txt, _2.txt file it's not working.
> Flink BucketingSink in stream by default put _ before the file names. So
> unable to read Sinked files from DataStream.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)