[jira] [Issue Comment Deleted] (FLINK-668) API Proposal - NamedDataSets

Markus Holzemer (JIRA) Wed, 18 Jun 2014 09:45:06 -0700

     [ 
https://issues.apache.org/jira/browse/FLINK-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Markus Holzemer updated FLINK-668:
----------------------------------

    Comment: was deleted

(was: The discussion on this topic is continued in a newer issue. (FLINK-947))

> API Proposal - NamedDataSets
> ----------------------------
>
>                 Key: FLINK-668
>                 URL: https://issues.apache.org/jira/browse/FLINK-668
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: GitHub Import
>              Labels: github-import
>             Fix For: pre-apache
>
>
> @StephanEwen, @aljoscha and me were discussing a further stage / alternative 
> version of the new Java API that we called NamedDataSets. Instead of dealing 
> with specific types that are checked on compile time, users should be able to 
> just use names of fields to operate on. The types would be checked not on 
> compile time but on pre flight time. That would give a feeling more similiar 
> to what SQL is like.
> Currently users often have to remember what position in the tuple a specific 
> field has, which can get a little bit annoying when dealing with bigger 
> queries. Using names instead would perhaps make this more manageable.
> I have created a first proposal for the syntax that we can use as a basis for 
> disussion:
> ```
> NamedDataSet nds = get3TupleDataSet(env).named("ID", "Number", "Comment");
>               
> NamedDataSet join = get3TupleDataSet(env).named("ID", "Number", "Comment");
>               
> NamedDataSet join_result = nds.join(join).where("ID").equalTo("ID");
>               
> NamedDataSet group_result = nds.groupBy("ID");
> // to apply a udf
> NamedDataSet reduceDs = nds.get("ID", "Number", 
> "Comment").types(Integer.class, Long.class, String.class)
>                               .groupBy(1).reduce(new 
> Tuple3Reduce("B-)")).named("ID", "Number", "Comment");
>               
> reduceDs.get("ID", "Number", "Comment").types(Integer.class, Long.class, 
> String.class).print();
> env.execute();
> ```
> My current development progress can be looked at here:
> https://github.com/markus-h/stratosphere/compare/named_dataset
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/issues/668
> Created by: [markus-h|https://github.com/markus-h]
> Labels: enhancement, java api, user satisfaction, 
> Created at: Tue Apr 08 13:31:59 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Issue Comment Deleted] (FLINK-668) API Proposal - NamedDataSets

Reply via email to