Hi, Gábor and I plan to investigate the metamorphic call issue further. We will implement the idea of combining QuickSort and NormalizedKeySorter together in generated code and benchmark the improvement with a Flink’s job that uses 3 different sorters.
Best, Pat > On Mar 23, 2017, at 6:31 PM, Gábor Gévay <gga...@gmail.com> wrote: > > Hello, > >> Second, additional classes will turn performance critical callsites >> megamorphic. > > Yes, this is a completely valid point, thanks for raising this issue > Greg. We were planning to have an offline discussion tomorrow with > Pattarawat about this. We have a few options: > 1. We could fuse the QuickSort and NormalizedKeySorter into a single > class when generating the code. > 2. As you said, duplicating the QuickSort class might also be a nice > solution, if we can find a simple way to pull it off. > 3. We might do some benchmarks and determine that the effect of this > issue is negligible and therefore we can ignore it. But I'm worried > that this is not the case, so we'll probably have to go with one of > the above options. > > Best, > Gábor > > > > On Thu, Mar 23, 2017 at 6:12 PM, Greg Hogan <c...@greghogan.com> wrote: >> I would be more than happy to shepherd and review this PR. >> >> I have two discussion points. First, a strategy for developing with >> templates. IntelliJ has a FreeMarker plugin but we lose formatting and code >> completion. To minimize this issue we can retain the untemplated code in an >> abstract class which is then concretely subclassed by the template. >> >> Second, additional classes will turn performance critical callsites >> megamorphic. Stephan noted this issue in his work on MemorySegment. >> http://flink.apache.org/news/2015/09/16/off-heap-memory.html >> >> For example, QuickSort calls IndexedSortable#compare and >> IndexedSortable#swap. With multiple compiled implementations of the sorter >> template these callsites can no longer be inlined (the same is true with >> NormalizedKeySorter and FixedLengthRecordSorter if the latter was >> instrumented). >> >> I have not found a way to duplicate a Java class at runtime, but we may be >> able to use Janino to compile a class which is then uniquely renamed: each >> IndexSortable type would map to a different QuickSort type (same bytecode, >> but uniquely optimized). This should also boost performance of runtime >> operators calling user defined functions. >> >> Given the code already written, I expect we can refactor, review, and >> benchmark for the 1.3 release. >> >> Greg >> >> >>> On Mar 21, 2017, at 3:46 PM, Fabian Hueske <fhue...@gmail.com> wrote: >>> >>> Hi Pat, >>> >>> thanks a lot for this great proposal! I think it is very well structured >>> and has the right level of detail. >>> The improvements of your performance benchmarks look very promising and I >>> think code-gen'd sorters would be a very nice improvement. >>> I like that you plan to add a switch to activate this feature. >>> >>> In order move on, we will need a committer who "champions" your FLIP, >>> reviews the pull request, and eventually merges it. >>> >>> @Greg and @Stephan, what do you think about this proposal? >>> >>> Best, Fabian >>> >>> >>> 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai <pat.chor...@gmail.com >>> <mailto:pat.chor...@gmail.com>>: >>> Hi all, >>> >>> I would like to initiate a discussion of applying code generation to >>> NormalizedKeySorter. The goal is to improve sorting performance by >>> generating suitable NormalizedKeySorter for underlying data. This generated >>> sorter will contains only necessary code in important methods, such as swap >>> and compare, hence improving sorting performance. >>> >>> Details of the implementation is illustrated at FLIP-18 : Code Generation >>> for improving sorting performance. >>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-18%3A+Code+Generation+for+improving+sorting+performance> >>> >>> >>> Also, because we’re doing it as a course project at TUB, we have completed >>> the implementation and made a pull-request >>> <https://github.com/apache/flink/pull/3511> to Flink repo already. >>> >>> From our evaluation, we have found that the pull-request reduces sorting >>> time around 7-10% and together with FLINK-3722 >>> <https://issues.apache.org/jira/browse/FLINK-3722> the sorting time is >>> decreased by 12-20%. >>> >>> >>> >>> Please take a look at the document and the pull-request and let me know if >>> you have any suggestion. >>> >>> Best, >>> Pat >>> >>