Github user ggevay commented on the issue:

    https://github.com/apache/flink/pull/3511
  
    > IMHO, in addition to these changes, there are still some potential 
improvements we can do about the sorter, like deserialization when comparing 
the real records.
    
    Do you mean `NormalizedKeySorter.compareRecords`? This calls 
`comparator.compareSerialized`, which is currently implemented for most types 
by simply deserializing the records and then comparing the deserialized form. 
It would maybe bring some performance improvement if this had a real 
implementation, which would deserialize only the key fields (which are relevant 
to the comparison). The hard part here is how to handle nullable fields, which 
make the offset of individual fields unpredictable.
    
    So I think a simpler and better approach is to just make sure that most 
types have a good implementation of `putNormalizedKey`, and then 
`NormalizedKeySorter.compareRecords` would be called only rarely, so its 
performance wouldn't really matter.
    
    > We are working on a project to fully code generate the algorithm Flink 
runtime used, and borrowed lots of idea of this PR, thanks!
    
    I'm happy to hear that!
    
    Btw. I think a large potential in code-generation is to eliminate the 
overheads of the very many virtual function calls throughout the runtime, by 
making the calls monomorphic [1,2]. For example, there could be a custom 
`MapDriver` generated for each map UDF, and then this custom class would always 
call the same UDF, making the call monomorphic, and thus easily optimizable by 
the JVM [1,2]. (I think @greghogan was also thinking about this.)
    
    For example, the following virtual calls are on the per-record code path:
    - drivers (such as `MapDriver`) calling `MutableObjectIterator.next`
    - drivers calling the UDF
    - drivers calling `output.collect`
    - `ReusingDeserializationDelegate` or `NonReusingDeserializationDelegate` 
calling `serializer.deserialize`
    - `SpillingResettableMutableObjectIterator` calling `serializer.serialize` 
and `serializer.deserialize`
    - `PojoSerializer`, `TupleSerializer`, `RowSerializer`, etc. calling their 
field serializers
    - `CountingCollector` calling `collector.collect`
    - `RecordWriter` calling `channelSelector.selectChannels`
    - `SerializationDelegate` calling `serializer.serialize`
    - `StreamElementSerializer` calling methods on its `typeSerializer`
    - `OutputEmitter` calling `comparator.hash`
    - `ComparableKeySelector` calling `comparator.extractKeys`
    - `assignToKeyGroup` calling `key.hashCode`
    
    I think most of the above calls are megamorphic (especially in larger Flink 
jobs with many operators), which makes them slow [1,2]. They could be made 
monomorphic by code-generating custom versions of these classes, where the 
customization would be to fix the type of the targets of these calls. (I think 
this could probably be done independently of the Table API.)
    
    Another potential for code-generation is customizing `OutputEmitter` for 
the number of channels: there are places where a modulo operation is performed, 
which is much faster if the divisor is a compile-time constant, since then the 
compiler uses such tricks as described in this paper:
    https://gmplib.org/~tege/divcnst-pldi94.pdf
    And the same optimization could be made in `KeyGroupRangeAssignment`, where 
there are divisions by `maxParallelism`.
    
    > we need more type information control and flexible code generating 
supports, so we choose to do it through the Table API & SQL. How do you see 
this approach?
    
    Having more info about the types, UDFs, etc. of your program certainly can 
help. Unfortunately I don't know too much about the Table API & SQL, but I have 
a few random thoughts:
    - Since you are generating the UDFs, it should be possible to make them 
have good object reuse behavior, without the users having to worry about the 
non-trivial object reuse rules.
    - For basic types (int, boolean, etc.) the generated code could use 
`IntValue`, `BooleanValue`, etc., facilitating more object reuse, again without 
making the users write ugly boilerplate (as they have to do in the 
DataSet/DataStream API when they want to use these classes).
    - Same thing with `StringValue`
    
    Btw. have you seen this PR for code generation for POJO serializers and 
comparators? https://github.com/apache/flink/pull/2211
    It has some issues, so it is not as close to merging as this PR, but maybe 
I'll try to tie that up as well some time.
    
    [1] 
http://insightfullogic.com/2014/May/12/fast-and-megamorphic-what-influences-method-invoca/
    [2] https://shipilev.net/blog/2015/black-magic-method-dispatch/


---

Reply via email to