[ 
https://issues.apache.org/jira/browse/FLINK-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645055#comment-14645055
 ] 

Fabian Hueske commented on FLINK-992:
-------------------------------------

Hi [~nrai], 

thanks for picking up this issue. Flink's DataSet API features DataSets which 
are built from regular Java collections. This is done via the 
ExecutionEnvironment as {{ExecutionEnvironment.fromCollection(myCollection)}}. 
Under the hood, the Java collection is submitted to the executing Flink 
instance (cluster, local, YARN, ...) and the collection's data is processed.

This feature will use Flink's collection DataSets to process a file which is 
local on the user's client on a remote cluster. Instead of copying the small 
file into a file system or data store that can be accessed from the cluster, 
the client will be able to convert the file into a Java collection and use the 
collection as a DataSet in a Flink program. I would propose to read the local 
file by using Flink's regular InputFormats.

Please let me know if you have further questions,
Fabian

> Create CollectionDataSets by reading (client) local files.
> ----------------------------------------------------------
>
>                 Key: FLINK-992
>                 URL: https://issues.apache.org/jira/browse/FLINK-992
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Python API, Scala API
>            Reporter: Fabian Hueske
>            Assignee: niraj rai
>            Priority: Minor
>              Labels: starter
>
> {{CollectionDataSets}} are a nice way to feed data into programs.
> We could add support to read a client-local file at program construction time 
> using a FileInputFormat, put its data into a CollectionDataSet, and ship its 
> data together with the program.
> This would remove the need to upload small files into DFS which are used 
> together with some large input (stored in DFS).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to