[
https://issues.apache.org/jira/browse/FLINK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17308628#comment-17308628
]
Etienne Chauchot commented on FLINK-6417:
-----------------------------------------
you can workaround this by using GlobFilePathFilter:
{code:java}
final FileInputFormat inputFormat = new FileInputFormat(new
Path(extractDir(filePath))); /* or any subclass */ /*extact parent dir */
inputFormat.setFilesFilter(new
GlobFilePathFilter(Collections.singletonList(filePath),
Collections.emptyList())); /*filePath contains glob, the whole path need to be
provided to GlobFilePathFilter*/
inputFormat.setNestedFileEnumeration(true);{code}
> Wildcard support for read text file
> -----------------------------------
>
> Key: FLINK-6417
> URL: https://issues.apache.org/jira/browse/FLINK-6417
> Project: Flink
> Issue Type: New Feature
> Components: API / DataSet
> Reporter: Artiom Darie
> Priority: Minor
>
> Add wildcard support while reading from s3://, hdfs://, file://, etc.
> h6. Examples:
> # {code} s3://bucket-name/*.gz {code}
> # {code} hdfs://path/*file-name*.csv {code}
> # {code} file://tmp/**/*.* {code}
> h6. Proposal
> # Use the existing method: {code}environment.readFile(...){code}
> # List all the files in the directories
> # Read files using existing: {code}ContinuousFileReaderOperator{code}
> h6. Concerns (Open for discussions)
> # Have multiple DataSource(s) created for each each file and then to join
> them into a single DataSource
> # Have all the files into the same DataSource
> # Have the listing of the files on the driver and load on each task manager
--
This message was sent by Atlassian Jira
(v8.3.4#803005)