Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

Pattarawat Chormai Fri, 24 Mar 2017 06:32:41 -0700

Hi,

Gábor and I plan to investigate the metamorphic call issue further. We will 
implement the idea of combining QuickSort and NormalizedKeySorter together in 
generated code and benchmark the improvement with a Flink’s job that uses 3 
different sorters.


Best,
Pat


> On Mar 23, 2017, at 6:31 PM, Gábor Gévay <gga...@gmail.com> wrote:
> 
> Hello,
> 
>> Second, additional classes will turn performance critical callsites 
>> megamorphic.
> 
> Yes, this is a completely valid point, thanks for raising this issue
> Greg. We were planning to have an offline discussion tomorrow with
> Pattarawat about this. We have a few options:
> 1. We could fuse the QuickSort and NormalizedKeySorter into a single
> class when generating the code.
> 2. As you said, duplicating the QuickSort class might also be a nice
> solution, if we can find a simple way to pull it off.
> 3. We might do some benchmarks and determine that the effect of this
> issue is negligible and therefore we can ignore it. But I'm worried
> that this is not the case, so we'll probably have to go with one of
> the above options.
> 
> Best,
> Gábor
> 
> 
> 
> On Thu, Mar 23, 2017 at 6:12 PM, Greg Hogan <c...@greghogan.com> wrote:
>> I would be more than happy to shepherd and review this PR.
>> 
>> I have two discussion points. First, a strategy for developing with 
>> templates. IntelliJ has a FreeMarker plugin but we lose formatting and code 
>> completion. To minimize this issue we can retain the untemplated code in an 
>> abstract class which is then concretely subclassed by the template.
>> 
>> Second, additional classes will turn performance critical callsites 
>> megamorphic. Stephan noted this issue in his work on MemorySegment.
>>  http://flink.apache.org/news/2015/09/16/off-heap-memory.html
>> 
>> For example, QuickSort calls IndexedSortable#compare and 
>> IndexedSortable#swap. With multiple compiled implementations of the sorter 
>> template these callsites can no longer be inlined (the same is true with 
>> NormalizedKeySorter and FixedLengthRecordSorter if the latter was 
>> instrumented).
>> 
>> I have not found a way to duplicate a Java class at runtime, but we may be 
>> able to use Janino to compile a class which is then uniquely renamed: each 
>> IndexSortable type would map to a different QuickSort type (same bytecode, 
>> but uniquely optimized). This should also boost performance of runtime 
>> operators calling user defined functions.
>> 
>> Given the code already written, I expect we can refactor, review, and 
>> benchmark for the 1.3 release.
>> 
>> Greg
>> 
>> 
>>> On Mar 21, 2017, at 3:46 PM, Fabian Hueske <fhue...@gmail.com> wrote:
>>> 
>>> Hi Pat,
>>> 
>>> thanks a lot for this great proposal! I think it is very well structured 
>>> and has the right level of detail.
>>> The improvements of your performance benchmarks look very promising and I 
>>> think code-gen'd sorters would be a very nice improvement.
>>> I like that you plan to add a switch to activate this feature.
>>> 
>>> In order move on, we will need a committer who "champions" your FLIP, 
>>> reviews the pull request, and eventually merges it.
>>> 
>>> @Greg and @Stephan, what do you think about this proposal?
>>> 
>>> Best, Fabian
>>> 
>>> 
>>> 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai <pat.chor...@gmail.com 
>>> <mailto:pat.chor...@gmail.com>>:
>>> Hi all,
>>> 
>>> I would like to initiate a discussion of applying code generation to 
>>> NormalizedKeySorter. The goal is to improve sorting performance by 
>>> generating suitable NormalizedKeySorter for underlying data. This generated 
>>> sorter will contains only necessary code in important methods, such as swap 
>>> and compare, hence improving sorting performance.
>>> 
>>> Details of the implementation is illustrated at FLIP-18 : Code Generation 
>>> for improving sorting performance. 
>>> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-18%3A+Code+Generation+for+improving+sorting+performance>
>>> 
>>> 
>>> Also, because we’re doing it as a course project at TUB, we have completed 
>>> the implementation and made a pull-request 
>>> <https://github.com/apache/flink/pull/3511> to Flink repo already.
>>> 
>>> From our evaluation, we have found that the pull-request reduces sorting 
>>> time around 7-10% and together with FLINK-3722 
>>> <https://issues.apache.org/jira/browse/FLINK-3722> the sorting time is 
>>> decreased by 12-20%.
>>> 
>>> 
>>> 
>>> Please take a look at the document and the pull-request and let me know if 
>>> you have any suggestion.
>>> 
>>> Best,
>>> Pat
>>> 
>>

Re: [DISCUSS] FLIP-18: Code Generation for improving sorting performance

Reply via email to