Here is a doc that details type handling/extraction for both the Java API and the Scala API, including the type hints.
https://github.com/apache/flink/blob/master/docs/internal_types_serialization.md Enjoy :-) On Fri, Jan 9, 2015 at 12:26 PM, Vasiliki Kalavri <vasilikikala...@gmail.com > wrote: > Hi, > > thanks for the nice explanation and the great work! > This will simplify our Graph API-lives a lot ^^ > > Cheers, > V. > > On 9 January 2015 at 11:59, Stephan Ewen <se...@apache.org> wrote: > > > I am adding a derivative of that text to the docs right now. > > > > > > > > On Fri, Jan 9, 2015 at 11:54 AM, Robert Metzger <rmetz...@apache.org> > > wrote: > > > > > Thank you! > > > > > > It would be amazing if you or somebody else could copy paste this into > > our > > > documentation. > > > > > > On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org> > wrote: > > > > > > > Hi everyone! > > > > > > > > We recently introduced type hints for the Java API. Since that is a > > > pretty > > > > useful feature, I wanted to quickly explain what it is. > > > > > > > > Kudos to Timo Walther, who did a large part of this work. > > > > > > > > > > > > *Background* > > > > > > > > Flink tries to figure out as much information about what types enter > > and > > > > leave user functions as possible. > > > > > > > > - For the POJO API (where one refers to field names), we need that > > > > information to make checks (for typos and type compatibility) before > > the > > > > job is executed. > > > > > > > > - For the upcoming logical programs (see roadmap draft) we need this > > to > > > > know the "schema" of functions. > > > > > > > > - The more we know, the better serialization and data layout schemes > > the > > > > compiler/optimizer can develop. That is quite important for the > memory > > > > usage paradigm in Flink (work on serialized data inside/outside the > > heap > > > > and make serialization very cheap) > > > > > > > > - Finally, it also spares users having to worry about serialization > > > > frameworks and having to register types at those frameworks. > > > > > > > > > > > > *Problem* > > > > > > > > Scala is an easy case, because it preserves generic type information > > > > (ClassTags / Type Manifests), but Java erases generic type info in > most > > > > cases. > > > > > > > > We do reflection analysis on the user function classes to get the > > generic > > > > types. This logic also contains some simple type inference in case > the > > > > functions have type variables (such as a MapFunction<T, Tuple2<T, > > > Long>>). > > > > > > > > Not in all cases can we figure out the data types of functions > reliably > > > in > > > > Java. Some issues remain with generic lambdas (we are trying to solve > > > this > > > > with the Java community, see below) and with generic type variables > > that > > > we > > > > cannot infer. > > > > > > > > > > > > *Solution: Type Hints* > > > > > > > > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT > > > > master introduced type hints. They allow you to tell the system types > > > that > > > > it cannot infer. > > > > > > > > You can write code like > > > > > > > > DataSet<SomeType> result = > > > > dataSet.map(new MyGenericNonInferrableFunction<Long, > > > > SomeType>()).returns(SomeType.class); > > > > > > > > > > > > To make specification of generic types easier, it also comes with a > > > parser > > > > for simple string representations of generic types: > > > > > > > > .returns("Tuple2<Integer, my.SomeType>") > > > > > > > > > > > > We suggest to use this instead of the "ResultTypeQueryable" > workaround > > > that > > > > has been used in some cases. > > > > > > > > > > > > *Improving Type information in Java* > > > > > > > > One Flink committer (Timo Walther) has actually become active in the > > > > Eclipse JDT compiler community and in the OpenJDK community to try > and > > > > improve the way type information is available for lambdas. > > > > > > > > > > > > Greetings, > > > > Stephan > > > > > > > > > >