Hi Xuchuanyin, This feature is great for compaction. I wonder do you observe more memory is used since it prefetch data in the memory? Do you have any number?
Regards, Jacky > 在 2018年11月7日,下午11:54,xuchuanyin <[email protected]> 写道: > > Hi all: > I am raising a PR to enhance the performance of compaction. The PR number is > #2906. > > Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), > I got the following results. > > Code Branch Prefetch Batch Size (default 100) Load1 (s) > Load2 (s) Load3 (s) Compact 3 Loads (s) Time Reduced > master NA 100 447.4 445.9 450.1 661.3 Base Line > master NA 32000 441.5 454.4 456.8 641.2 +3.0% > PR2906 enable 100 445.3 450.2 445.3 411.8 +37.7% > PR2906 enable 32000 438.7 446.8 441.8 333.1 +49.6% > PR2906 disable 100 458.1 459.4 450.9 659.5 +0.3% > PR2906 disable 32000 472.0 446.8 457.1 654.5 +1.0% > Note: These tests are under spark-2.2 version > > The results show that compaction performance is almost doubled if configured > properly. > It also shows even if this feature is disabled, the compaction performance > still not decrease. > > So here: > > 1. I do want to make this feature ‘enabled’ by default. > > 2. Besides, I’d want the others in the community also test this feature and > check whether we can benefit from this feature. > > Any feedback is welcome. > >
