Here is a doc that details type handling/extraction for both the Java API
and the Scala API, including the type hints.

https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md

Enjoy :-)

On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri <vasilikikala...@gmail.com
> wrote:

> Hi,
>
> thanks for the nice explanation and the great work!
> This will simplify our Graph API-lives a lot ^^
>
> Cheers,
> V.
>
> On 9 January 2015 at 11:59, Stephan Ewen <se...@apache.org> wrote:
>
> > I am adding a derivative of that text to the docs right now.
> >
> >
> >
> > On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rmetz...@apache.org>
> > wrote:
> >
> > > Thank you!
> > >
> > > It would be amazing if you or somebody else could copy paste this into
> > our
> > > documentation.
> > >
> > > On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org>
> wrote:
> > >
> > > > Hi everyone!
> > > >
> > > > We recently introduced type hints for the Java API. Since that is a
> > > pretty
> > > > useful feature, I wanted to quickly explain what it is.
> > > >
> > > > Kudos to Timo Walther, who did a large part of this work.
> > > >
> > > >
> > > > *Background*
> > > >
> > > > Flink tries to figure out as much information about what types enter
> > and
> > > > leave user functions as possible.
> > > >
> > > >  - For the POJO API (where one refers to field names), we need that
> > > > information to make checks (for typos and type compatibility) before
> > the
> > > > job is executed.
> > > >
> > > >  - For the upcoming logical programs (see roadmap draft) we need this
> > to
> > > > know the "schema" of functions.
> > > >
> > > >  - The more we know, the better serialization and data layout schemes
> > the
> > > > compiler/optimizer can develop. That is quite important for the
> memory
> > > > usage paradigm in Flink (work on serialized data inside/outside the
> > heap
> > > > and make serialization very cheap)
> > > >
> > > >  - Finally, it also spares users having to worry about serialization
> > > > frameworks and having to register types at those frameworks.
> > > >
> > > >
> > > > *Problem*
> > > >
> > > > Scala is an easy case, because it preserves generic type information
> > > > (ClassTags / Type Manifests), but Java erases generic type info in
> most
> > > > cases.
> > > >
> > > > We do reflection analysis on the user function classes to get the
> > generic
> > > > types. This logic also contains some simple type inference in case
> the
> > > > functions have type variables (such as a MapFunction<T, Tuple2<T,
> > > Long>>).
> > > >
> > > > Not in all cases can we figure out the data types of functions
> reliably
> > > in
> > > > Java. Some issues remain with generic lambdas (we are trying to solve
> > > this
> > > > with the Java community, see below) and with generic type variables
> > that
> > > we
> > > > cannot infer.
> > > >
> > > >
> > > > *Solution: Type Hints*
> > > >
> > > > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT
> > > > master introduced type hints. They allow you to tell the system types
> > > that
> > > > it cannot infer.
> > > >
> > > > You can write code like
> > > >
> > > > DataSet<SomeType> result =
> > > >         dataSet.map(new MyGenericNonInferrableFunction<Long,
> > > > SomeType>()).returns(SomeType.class);
> > > >
> > > >
> > > > To make specification of generic types easier, it also comes with a
> > > parser
> > > > for simple string representations of generic types:
> > > >
> > > >   .returns("Tuple2<Integer, my.SomeType>")
> > > >
> > > >
> > > > We suggest to use this instead of the "ResultTypeQueryable"
> workaround
> > > that
> > > > has been used in some cases.
> > > >
> > > >
> > > > *Improving Type information in Java*
> > > >
> > > > One Flink committer (Timo Walther) has actually become active in the
> > > > Eclipse JDT compiler community and in the OpenJDK community to try
> and
> > > > improve the way type information is available for lambdas.
> > > >
> > > >
> > > > Greetings,
> > > > Stephan
> > > >
> > >
> >
>

Reply via email to