Github user KurtYoung commented on the issue: https://github.com/apache/flink/pull/3511 > So I think a simpler and better approach is to just make sure that most types have a good implementation of putNormalizedKey, and then NormalizedKeySorter.compareRecords would be called only rarely, so its performance wouldn't really matter. You are right when the sort keys are simple numeric types, but not with strings, which maybe the most popular choice in some ETL and data warehouse pipelines. But i agree that code generation can't help with this situation, so we investigate some binary data formats to represent our record and modify the interface of TypeSerializer & TypeComparator when doing ser/de. We don't have to consume the input/output view byte by byte, but has the ability to random access the underlying data, aka MemorySegment. It acts like spark's UnsafeRow: https://reviewable.io/reviews/apache/spark/5725, so we can eliminate the most deserialization cost such as `read byte[]` and then `new String(byte[])`. We combine this approach with some code generation to eliminate the virtual function call of the TypeComparator and see a 10x performance improvements with sorting on strings. > I think a large potential in code-generation is to eliminate the overheads of the very many virtual function calls throughout the runtime Totally agreed, after we finish dealing with the code generation and improving the ser/de, we will investigate more about this. Good to see that you have a list of all the megamorphic calls. BTW, we are actually translating the batch jobs into the streaming runtime, i think there will be lots in common. Having and control more type informations, and code generation the whole operator have lots of benefits, it can also help to making most of the calls monomorphic, such as: - fully control of the object reusing, yes - comparators - generating hash codes - potential improvements of some algorithm which finds out they only need to deal with fixed length data - Directly using primitive variables when dealing with simple type And you are right this is orthogonal with runtime improvements, and we see the boundary is the Operator. The framework should provide the most efficient environment for operators to run, and we will code generating the most efficient operators to live in it. > Btw. have you seen this PR for code generation for POJO serializers and comparators? #2211 I didn't see it yet, will find some time to check it out.
---