Re: Tests with lzo compression in kylin

hongbin ma Sat, 05 Sep 2015 23:48:07 -0700

hi chunen

thanks for sharing your experience.


In general, adopting compression on HBase storage and MR jobs is considered
to be worthwhile because it could save lots of space.(When cube grows large
this is especially desired)

https://issues.apache.org/jira/browse/KYLIN-956 enabled admins to set
compression on HBase tables and MR jobs.(This is a patch contributed by
Liang Meng) After several rounds of refinement, we have finally chosen
snappy as the default compression algorithm for both HBase and MR jobs.
Snappy has a more friendly license which allows HDP to ship it together
with its distribution. However, normally LZO requires manual installation.

Snappy and LZO should be comparable in performance (speed and compress
ratio), we also have documents to help those who insist to use LZO:
http://kylin.incubator.apache.org/docs/install/advance_settings.html

On Sun, Sep 6, 2015 at 9:56 AM, [email protected] <[email protected]>
wrote:

> i use snappy instead of lzo, and found that  the time of cube building
> increased, without snappy: less than 4hours, with snappy: about 5hours(more
> time spent on converting results to hfiles), and with compression, the cube
> size decrease about 25%-30%, i hadn't test the query performance yet, the
> table include about 6 billions records and 8 dimensions;
>
>
>
> 中国移动广东有限公司 网管中心 梁猛
> [email protected]
>
> 发件人： nichunen
> 发送时间： 2015-09-06 09:36
> 收件人： dev
> 主题： Tests with lzo compression in kylin
> Hi,
>
> I have made some tests on our cluster after hadoop lzo installed and lzo
> enabled in kylin. Kylin has better performance with LZO.
>
> I build cubes with two tables, small one with 10,000 records(table called
> Small_Table), and large one with 4,000,000 records(table called
> Large_Table).
>
> The cube sizes are reduced obviously.
>
> Large_Table
> Small_Table
>  No LZO
>  776.33m
> 16.15m
>  LZO
> 571.49m
>  7.53m
>
> For the query duration time is not quite stable, I made comparation with a
> time-consuming query on kylin with and without lzo. The query seems like
> "SELECT A,B from Large_Table where A<'5000000000' and B>'5000000000' group
> by  A,B order by A;"
> On Kylin with lzo, I queried for 10 times, the time durings were:
> 4.80s,5.74s,5.98s,4.95s,4.86s,7.24s,4.72s,6.80s,6.42s,7.08s
> The mean time was 5.859s.
>
> On Kylin without lzo, I queried for 10 times, the time durings were:
> 11.66s,6.31s,7.17s,6.37s,6.78s,6.43s,7.47s,5.62s,7.60s,6.47s
> The mean time was 7.188s.
>
> For the time of cube building, I didn't see much improvement, maybe this
> is because I didn't build many times and do not have more accurate
> comparations.
> Could you please share your experience about Kylin with lzo?
>
> Tnanks
>
>
>
> Best Regards,
>
> George/倪春恩
> Software Engineer/软件工程师
> Mobile:+86-13501723787| Fax:+8610-56842040
> 北京明略软件系统有限公司（www.mininglamp.com）
> 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层
> F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping
> District,Beijing,102218
>
> ----------------------------------------------------------------------------------------------------------------------------
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Tests with lzo compression in kylin

Reply via email to