I would be more than happy to shepherd and review this PR. I have two discussion points. First, a strategy for developing with templates. IntelliJ has a FreeMarker plugin but we lose formatting and code completion. To minimize this issue we can retain the untemplated code in an abstract class which is then concretely subclassed by the template.
Second, additional classes will turn performance critical callsites megamorphic. Stephan noted this issue in his work on MemorySegment. http://flink.apache.org/news/2015/09/16/off-heap-memory.html For example, QuickSort calls IndexedSortable#compare and IndexedSortable#swap. With multiple compiled implementations of the sorter template these callsites can no longer be inlined (the same is true with NormalizedKeySorter and FixedLengthRecordSorter if the latter was instrumented). I have not found a way to duplicate a Java class at runtime, but we may be able to use Janino to compile a class which is then uniquely renamed: each IndexSortable type would map to a different QuickSort type (same bytecode, but uniquely optimized). This should also boost performance of runtime operators calling user defined functions. Given the code already written, I expect we can refactor, review, and benchmark for the 1.3 release. Greg > On Mar 21, 2017, at 3:46 PM, Fabian Hueske <fhue...@gmail.com> wrote: > > Hi Pat, > > thanks a lot for this great proposal! I think it is very well structured and > has the right level of detail. > The improvements of your performance benchmarks look very promising and I > think code-gen'd sorters would be a very nice improvement. > I like that you plan to add a switch to activate this feature. > > In order move on, we will need a committer who "champions" your FLIP, reviews > the pull request, and eventually merges it. > > @Greg and @Stephan, what do you think about this proposal? > > Best, Fabian > > > 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai <pat.chor...@gmail.com > <mailto:pat.chor...@gmail.com>>: > Hi all, > > I would like to initiate a discussion of applying code generation to > NormalizedKeySorter. The goal is to improve sorting performance by generating > suitable NormalizedKeySorter for underlying data. This generated sorter will > contains only necessary code in important methods, such as swap and compare, > hence improving sorting performance. > > Details of the implementation is illustrated at FLIP-18 : Code Generation for > improving sorting performance. > <https://cwiki.apache.org/confluence/display/FLINK/FLIP-18%3A+Code+Generation+for+improving+sorting+performance> > > > Also, because we’re doing it as a course project at TUB, we have completed > the implementation and made a pull-request > <https://github.com/apache/flink/pull/3511> to Flink repo already. > > From our evaluation, we have found that the pull-request reduces sorting time > around 7-10% and together with FLINK-3722 > <https://issues.apache.org/jira/browse/FLINK-3722> the sorting time is > decreased by 12-20%. > > > > Please take a look at the document and the pull-request and let me know if > you have any suggestion. > > Best, > Pat >