Thank you! It would be amazing if you or somebody else could copy paste this into our documentation.
On Fri, Jan 9, 2015 at 11:44 AM, Stephan Ewen <se...@apache.org> wrote: > Hi everyone! > > We recently introduced type hints for the Java API. Since that is a pretty > useful feature, I wanted to quickly explain what it is. > > Kudos to Timo Walther, who did a large part of this work. > > > *Background* > > Flink tries to figure out as much information about what types enter and > leave user functions as possible. > > - For the POJO API (where one refers to field names), we need that > information to make checks (for typos and type compatibility) before the > job is executed. > > - For the upcoming logical programs (see roadmap draft) we need this to > know the "schema" of functions. > > - The more we know, the better serialization and data layout schemes the > compiler/optimizer can develop. That is quite important for the memory > usage paradigm in Flink (work on serialized data inside/outside the heap > and make serialization very cheap) > > - Finally, it also spares users having to worry about serialization > frameworks and having to register types at those frameworks. > > > *Problem* > > Scala is an easy case, because it preserves generic type information > (ClassTags / Type Manifests), but Java erases generic type info in most > cases. > > We do reflection analysis on the user function classes to get the generic > types. This logic also contains some simple type inference in case the > functions have type variables (such as a MapFunction<T, Tuple2<T, Long>>). > > Not in all cases can we figure out the data types of functions reliably in > Java. Some issues remain with generic lambdas (we are trying to solve this > with the Java community, see below) and with generic type variables that we > cannot infer. > > > *Solution: Type Hints* > > To make this cases work easily, a recent addition to the 0.9-SNAPSHOT > master introduced type hints. They allow you to tell the system types that > it cannot infer. > > You can write code like > > DataSet<SomeType> result = > dataSet.map(new MyGenericNonInferrableFunction<Long, > SomeType>()).returns(SomeType.class); > > > To make specification of generic types easier, it also comes with a parser > for simple string representations of generic types: > > .returns("Tuple2<Integer, my.SomeType>") > > > We suggest to use this instead of the "ResultTypeQueryable" workaround that > has been used in some cases. > > > *Improving Type information in Java* > > One Flink committer (Timo Walther) has actually become active in the > Eclipse JDT compiler community and in the OpenJDK community to try and > improve the way type information is available for lambdas. > > > Greetings, > Stephan >