Hi all,

I have a pending pull request (#311) to fix and enable semantic information
for functions with nested and Pojo types.
Semantic information is used to tell the optimizer about the behavior of
user-defined functions.
The optimizer can use this information to generate more efficient execution
plans.

Assume for example a data set which is partitioned on the first field of a
tuple and which is given to a Map function. If the optimizer knows, that
the Map function does not modify the first field, it can infer that the
data is still partitioned after the Map function was applied.

There are two ways to give semantic information for user-defined function:
1) Class annotations:
@ConstantFields("0; 1->2")
public class MyMapper extends MapFunction<...> { }

2) Inline data flow:
data.map(new MapFunction<...>() {...}).witConstantSet("0; 1->2");

In both cases the semantic annotation indicates that the first field (0) is
preserved and the second field of the input (1) is forwarded to the third
field of the output (2).

The question is how should we name this feature?
Right now it is inconsistently called "ConstantField" and "ConstantSet".

I would prefer the name ForwardedFields because this indicates that fields
are "forwarded" through the function and possibly also moved to another
location. It would however, change the API (although I don't think this
feature is often used because it was not advertised a lot).

Any other suggestions or opinions on this?

Cheers, Fabian

Reply via email to