[
https://issues.apache.org/jira/browse/FLINK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311691#comment-17311691
]
Chesnay Schepler commented on FLINK-6417:
-----------------------------------------
But for the DataSet API we don't really need to change anything, do we? It can
be accomplished with the existing APIs after all.
I don't think it is a good idea to now try to retrofit this into the Path or
FileInputFormat. It changes behaviors in subtle ways that could break existing
setups. For example, with the proposed changes more file infos are pulled from
the filesystems. More paths are passed to the filter which may break existing
logic. Subclasses may be surprised that the set filepath is different.
> Wildcard support for read text file
> -----------------------------------
>
> Key: FLINK-6417
> URL: https://issues.apache.org/jira/browse/FLINK-6417
> Project: Flink
> Issue Type: New Feature
> Components: API / DataSet
> Reporter: Artiom Darie
> Priority: Minor
> Labels: pull-request-available
>
> Add wildcard support while reading from s3://, hdfs://, file://, etc.
> h6. Examples:
> # {code} s3://bucket-name/*.gz {code}
> # {code} hdfs://path/*file-name*.csv {code}
> # {code} file://tmp/**/*.* {code}
> h6. Proposal
> # Use the existing method: {code}environment.readFile(...){code}
> # List all the files in the directories
> # Read files using existing: {code}ContinuousFileReaderOperator{code}
> h6. Concerns (Open for discussions)
> # Have multiple DataSource(s) created for each each file and then to join
> them into a single DataSource
> # Have all the files into the same DataSource
> # Have the listing of the files on the driver and load on each task manager
--
This message was sent by Atlassian Jira
(v8.3.4#803005)