Hello,

Thanks Fabian for starting this discussion.

I would just like to add a few thougths about why does the
FieldAccessors even exist. One could say that we should instead just
re-use ExpressionKeys for the aggregations, and we are done. As far as
I can see, the way ExpressionKeys is often used is roughly to call
computeLogicalKeyPositions on it, then call
CompositeType.createComparator with these key positions, and then we
can call extractKeys on the comparator to get the values of the fields
of a record.

However, the problem with this is that in the aggregations we also
need to write fields and not just read them. So with the above
procedure, we would need an "intractKey" method on the comparators
that would be the opposite of extractKeys, i.e. which would set the
key fields.

Another difference between the FieldAccessor world and the
ExpressionKeys world is that in the aggregations we need to deal with
only one field. This means that enhancing the comparators with the
exact opposite of extractKeys would be overkill, because we don't need
to set multiple fields at a time.

It certainly seems to be a good idea to unify these at least
externally (from the side of the users), but I'm not sure what would
be a neat way to also internally unify them.

Best,
Gábor


2016-10-26 20:54 GMT+02:00 Fabian Hueske <fhue...@gmail.com>:
> Hi everybody,
>
> while reviewing PR #2094 [1] I noticed that the field reference syntax for
> FieldAccessors is not compatible with the syntax supported for key
> definitions (ExpressionKeys) used in groupBy(), keyBy(),
> join().where().equalTo(), etc.
>
> FieldAccessors are only used for build-in aggregations in the DataStream
> API (sum(), min(), max(), ...).
>
> In particular I identified the following inconsistencies:
>
> - FieldAccessors allow to address array cells. ExpressionKeys treat arrays
> as AtomicTypes (Array TypeInfos do not extend CompositeType). Hence, it is
> not possible to address array cells.
> - ExpressionKeys do only support Integer keys for tuples. An atomic type
> can only be addressed with "*". FieldAccessors allow to address AtomicTypes
> with 0 in addtion to "*".
> - ExpressionKeys support to address fields of Java tuples with "f2" and
> Scala tuple fields with "_3". FieldAccessors do not support the "f" or "_"
> prefix.
>
> I would like to propose to adapt the syntax of both mechanisms (ideally,
> both should use the same code for validation). IMO, the ExpressionKey
> syntax much more widely used and is well designed. Therefore, I would adopt
> it for FieldAccessors as well. However, that would mean to restrict the
> syntax of the FieldAccessors and might break existing code.
>
> What do others think?
>
> [1] https://github.com/apache/flink/pull/2094

Reply via email to