Re: Enhancement on compaction performance

Jacky Li Thu, 08 Nov 2018 13:15:55 -0800

Hi Xuchuanyin,

This feature is great for compaction. I wonder do you observe more memory is 
used since it prefetch data in the memory? Do you have any number?


Regards,
Jacky

> 在 2018年11月7日，下午11:54，xuchuanyin <[email protected]> 写道：
> 
> Hi all:
> I am raising a PR to enhance the performance of compaction. The PR number is 
> #2906.
> 
> Based on my experiments using about 72GB LineItem data ( in 100GB TPCH data), 
> I got the following results.
> 
> Code Branch   Prefetch        Batch Size (default 100)        Load1 (s)       
> Load2 (s)       Load3 (s)       Compact 3 Loads (s)     Time Reduced
> master        NA      100     447.4   445.9   450.1   661.3   Base Line
> master        NA      32000   441.5   454.4   456.8   641.2   +3.0%
> PR2906        enable  100     445.3   450.2   445.3   411.8   +37.7%
> PR2906        enable  32000   438.7   446.8   441.8   333.1   +49.6%
> PR2906        disable 100     458.1   459.4   450.9   659.5   +0.3%
> PR2906        disable 32000   472.0   446.8   457.1   654.5   +1.0%
> Note: These tests are under spark-2.2 version
> 
> The results show that compaction performance is almost doubled if configured 
> properly.
> It also shows even if this feature is disabled, the compaction performance 
> still not decrease.
> 
> So here:
> 
> 1. I do want to make this feature ‘enabled’ by default.
> 
> 2. Besides, I’d want the others in the community also test this feature and 
> check whether we can benefit from this feature.
> 
> Any feedback is welcome.
> 
>

Re: Enhancement on compaction performance

Reply via email to