Github user KurtYoung commented on the issue:

    https://github.com/apache/flink/pull/3511
  
    > So I think a simpler and better approach is to just make sure that most 
types have a good implementation of putNormalizedKey, and then 
NormalizedKeySorter.compareRecords would be called only rarely, so its 
performance wouldn't really matter.
    
    You are right when the sort keys are simple numeric types, but not with 
strings, which maybe the most popular choice in some ETL and data warehouse 
pipelines. But i agree that code generation can't help with this situation, so 
we investigate some binary data formats to represent our record and modify the 
interface of TypeSerializer & TypeComparator when doing ser/de. We don't have 
to consume the input/output view byte by byte, but has the ability to random 
access the underlying data, aka MemorySegment. It acts like spark's UnsafeRow: 
https://reviewable.io/reviews/apache/spark/5725, so we can eliminate the most 
deserialization cost such as `read byte[]` and then `new String(byte[])`.  We 
combine this approach with some code generation to eliminate the virtual 
function call of the TypeComparator and see a 10x performance improvements with 
sorting on strings. 
    
    > I think a large potential in code-generation is to eliminate the 
overheads of the very many virtual function calls throughout the runtime
    
    Totally agreed, after we finish dealing with the code generation and 
improving the ser/de, we will investigate more about this. Good to see that you 
have a list of all the megamorphic calls. BTW, we are actually translating the 
batch jobs into the streaming runtime, i think there will be lots in common. 
    
    Having and control more type informations, and code generation the whole 
operator have lots of benefits, it can also help to making most of the calls 
monomorphic, such as:
    - fully control of the object reusing, yes
    - comparators
    - generating hash codes
    - potential improvements of some algorithm which finds out they only need 
to deal with fixed length data
    - Directly using primitive variables when dealing with simple type
    
    And you are right this is orthogonal with runtime improvements, and we see 
the boundary is the Operator. The framework should provide the most efficient 
environment for operators to run, and we will code generating the most 
efficient operators to live in it. 
    
    > Btw. have you seen this PR for code generation for POJO serializers and 
comparators? #2211
    
    I didn't see it yet, will find some time to check it out.


---

Reply via email to