[jira] [Commented] (FLINK-6417) Wildcard support for read text file

Chesnay Schepler (Jira) Tue, 30 Mar 2021 10:51:12 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-6417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17311691#comment-17311691
 ]


Chesnay Schepler commented on FLINK-6417:
-----------------------------------------

But for the DataSet API we don't really need to change anything, do we? It can 
be accomplished with the existing APIs after all.

I don't think it is a good idea to now try to retrofit this into the Path or 
FileInputFormat. It changes behaviors in subtle ways that could break existing 
setups. For example, with the proposed changes more file infos are pulled from 
the filesystems. More paths are passed to the filter which may break existing 
logic. Subclasses may be surprised that the set filepath is different.

> Wildcard support for read text file
> -----------------------------------
>
>                 Key: FLINK-6417
>                 URL: https://issues.apache.org/jira/browse/FLINK-6417
>             Project: Flink
>          Issue Type: New Feature
>          Components: API / DataSet
>            Reporter: Artiom Darie
>            Priority: Minor
>              Labels: pull-request-available
>
> Add wildcard support while reading from s3://, hdfs://, file://, etc.
> h6. Examples:
> # {code} s3://bucket-name/*.gz {code}
> # {code} hdfs://path/*file-name*.csv {code}
> # {code} file://tmp/**/*.* {code}
> h6. Proposal
> # Use the existing method: {code}environment.readFile(...){code}
> # List all the files in the directories
> # Read files using existing: {code}ContinuousFileReaderOperator{code}
> h6. Concerns (Open for discussions)
> # Have multiple DataSource(s) created for each each file and then to join 
> them into a single DataSource
> # Have all the files into the same DataSource
> # Have the listing of the files on the driver and load on each task manager



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (FLINK-6417) Wildcard support for read text file

Reply via email to