I would be more than happy to shepherd and review this PR.

I have two discussion points. First, a strategy for developing with templates. 
IntelliJ has a FreeMarker plugin but we lose formatting and code completion. To 
minimize this issue we can retain the untemplated code in an abstract class 
which is then concretely subclassed by the template.

Second, additional classes will turn performance critical callsites 
megamorphic. Stephan noted this issue in his work on MemorySegment. 
  http://flink.apache.org/news/2015/09/16/off-heap-memory.html

For example, QuickSort calls IndexedSortable#compare and IndexedSortable#swap. 
With multiple compiled implementations of the sorter template these callsites 
can no longer be inlined (the same is true with NormalizedKeySorter and 
FixedLengthRecordSorter if the latter was instrumented).

I have not found a way to duplicate a Java class at runtime, but we may be able 
to use Janino to compile a class which is then uniquely renamed: each 
IndexSortable type would map to a different QuickSort type (same bytecode, but 
uniquely optimized). This should also boost performance of runtime operators 
calling user defined functions.

Given the code already written, I expect we can refactor, review, and benchmark 
for the 1.3 release.

Greg


> On Mar 21, 2017, at 3:46 PM, Fabian Hueske <fhue...@gmail.com> wrote:
> 
> Hi Pat,
> 
> thanks a lot for this great proposal! I think it is very well structured and 
> has the right level of detail.
> The improvements of your performance benchmarks look very promising and I 
> think code-gen'd sorters would be a very nice improvement.
> I like that you plan to add a switch to activate this feature.
> 
> In order move on, we will need a committer who "champions" your FLIP, reviews 
> the pull request, and eventually merges it.
> 
> @Greg and @Stephan, what do you think about this proposal?
> 
> Best, Fabian
> 
> 
> 2017-03-14 16:10 GMT+01:00 Pattarawat Chormai <pat.chor...@gmail.com 
> <mailto:pat.chor...@gmail.com>>:
> Hi all,
> 
> I would like to initiate a discussion of applying code generation to 
> NormalizedKeySorter. The goal is to improve sorting performance by generating 
> suitable NormalizedKeySorter for underlying data. This generated sorter will 
> contains only necessary code in important methods, such as swap and compare, 
> hence improving sorting performance. 
> 
> Details of the implementation is illustrated at FLIP-18 : Code Generation for 
> improving sorting performance. 
> <https://cwiki.apache.org/confluence/display/FLINK/FLIP-18%3A+Code+Generation+for+improving+sorting+performance>
> 
> 
> Also, because we’re doing it as a course project at TUB, we have completed 
> the implementation and made a pull-request 
> <https://github.com/apache/flink/pull/3511> to Flink repo already.
> 
> From our evaluation, we have found that the pull-request reduces sorting time 
> around 7-10% and together with FLINK-3722 
> <https://issues.apache.org/jira/browse/FLINK-3722> the sorting time is 
> decreased by 12-20%.
> 
> 
> 
> Please take a look at the document and the pull-request and let me know if 
> you have any suggestion.
> 
> Best,
> Pat
> 

Reply via email to