Ok, then we have to find some other solution for interop between comparators. What's the performance increase from that pull request?
On Wed, Aug 27, 2014 at 9:24 PM, Stephan Ewen <[email protected]> wrote: > A lot of comparisons (and that is what makes it fast) can happen on the > binary data, without turning them into a comparable. > > The latest pull request in the String representation for example improved > String-keyed sorting quite a bit: > > https://github.com/apache/incubator-flink/pull/4 > > > > > > On Wed, Aug 27, 2014 at 9:00 PM, Aljoscha Krettek <[email protected]> > wrote: > >> It would work on binary data, for example for tuples it would become: >> >> public Comparable extractKeys(DataInputView in) { >> Object[] fields = ... >> // extract only relevant fields from binary input and save in fields >> return Tuple(fields) // something like that >> } >> >> And for normalized keys something similar can be done. >> >> Aljoscha >> >> On Wed, Aug 27, 2014 at 8:39 PM, Stephan Ewen <[email protected]> wrote: >> > The design of the comparators so far was to make them work on the binary >> > data. That we need to retain, in my opinion, otherwise there is no >> > way to get good performance out of working on serialized data. >> > >> > I personally think that creating a tuple2 (key/value pair) when using >> > selector functions is actually good: >> > The key type (being treated by its dedicated comparator) benefits from >> all >> > the optimizations implemented for that type (bin copying, normalized >> keys, >> > ...) >> > That would be very hard to incorporate into any comparator that just >> > deserializes some comparable. >> > >> > Also, the key extractor can contain sort of heavy magic (such as to block >> > keys), whatever a user put in there. If we put that into the comparator, >> it >> > gets called for >> > every comparison! >> > >> > I do agree, though, that we need to come up with a better interface that >> > seamlessly allows working on binary versions and on objects, without >> > duplicating too much code. >> > >> > From your suggestion, I am not sure I got everything. Could you post a >> > concrete example or code? >> > >> > Stephan >> > >> > >> > >> > >> > On Wed, Aug 27, 2014 at 5:02 PM, Aljoscha Krettek <[email protected]> >> > wrote: >> > >> >> Hi Guys, >> >> while porting the Java API to Scala I'm noticing how complicated >> >> things are because of how our TypeComparators work: 1) There is only >> >> one type of comparator per TypeInformation which is created by the >> >> TypeInformation. Therefore, our KeySelectors are not actually >> >> implemented as comparators but as generated mappers that emit a >> >> Tuple2, because you wouldn't for example be able to generate a >> >> SelectorFunctionComparator for a TupleTypeInfo. (There's also a lot >> >> of magic going on with wrapping and unwrapping those tuples in Reduce, >> >> Join, and CoGroup.) 2) Comparators cannot really interoperate, there >> >> is special case code for the combinations that work. This will only >> >> get worse when we properly introduce POJO types, which should work >> >> well with tuple comparators and the other comparators. >> >> >> >> My proposal is this: No more TypeComparator on a per type basis. Just >> >> a generic comparator and PairComparator that work on Comparable. What >> >> used to be TypeComparators become SelectionExtractors that return a >> >> Comparable. Make Tuple comparable or add new ComparableTuple. The >> >> TupleSelectionExtractor would return a comparable tuple of the >> >> appropriate length (same for POJOs). For Tuple extractors that operate >> >> on only one field they would immediately return that field, without >> >> wrapping it in a tuple. This would directly support our existing >> >> KeySelector functions since the already return Comparable, when >> >> returning a tuple in a key selector function this would be compatible >> >> with a TupleSelectionExtractor (on the other join side, for example). >> >> >> >> That's my idea. What do you think? I think the current state is not >> >> maintainable, so we should do something quickly. :D >> >> >> >> Cheers, >> >> Aljoscha >> >> >>
