hi chunen thanks for sharing your experience.
In general, adopting compression on HBase storage and MR jobs is considered to be worthwhile because it could save lots of space.(When cube grows large this is especially desired) https://issues.apache.org/jira/browse/KYLIN-956 enabled admins to set compression on HBase tables and MR jobs.(This is a patch contributed by Liang Meng) After several rounds of refinement, we have finally chosen snappy as the default compression algorithm for both HBase and MR jobs. Snappy has a more friendly license which allows HDP to ship it together with its distribution. However, normally LZO requires manual installation. Snappy and LZO should be comparable in performance (speed and compress ratio), we also have documents to help those who insist to use LZO: http://kylin.incubator.apache.org/docs/install/advance_settings.html On Sun, Sep 6, 2015 at 9:56 AM, [email protected] <[email protected]> wrote: > i use snappy instead of lzo, and found that the time of cube building > increased, without snappy: less than 4hours, with snappy: about 5hours(more > time spent on converting results to hfiles), and with compression, the cube > size decrease about 25%-30%, i hadn't test the query performance yet, the > table include about 6 billions records and 8 dimensions; > > > > 中国移动广东有限公司 网管中心 梁猛 > [email protected] > > 发件人: nichunen > 发送时间: 2015-09-06 09:36 > 收件人: dev > 主题: Tests with lzo compression in kylin > Hi, > > I have made some tests on our cluster after hadoop lzo installed and lzo > enabled in kylin. Kylin has better performance with LZO. > > I build cubes with two tables, small one with 10,000 records(table called > Small_Table), and large one with 4,000,000 records(table called > Large_Table). > > The cube sizes are reduced obviously. > > Large_Table > Small_Table > No LZO > 776.33m > 16.15m > LZO > 571.49m > 7.53m > > For the query duration time is not quite stable, I made comparation with a > time-consuming query on kylin with and without lzo. The query seems like > "SELECT A,B from Large_Table where A<'5000000000' and B>'5000000000' group > by A,B order by A;" > On Kylin with lzo, I queried for 10 times, the time durings were: > 4.80s,5.74s,5.98s,4.95s,4.86s,7.24s,4.72s,6.80s,6.42s,7.08s > The mean time was 5.859s. > > On Kylin without lzo, I queried for 10 times, the time durings were: > 11.66s,6.31s,7.17s,6.37s,6.78s,6.43s,7.47s,5.62s,7.60s,6.47s > The mean time was 7.188s. > > For the time of cube building, I didn't see much improvement, maybe this > is because I didn't build many times and do not have more accurate > comparations. > Could you please share your experience about Kylin with lzo? > > Tnanks > > > > Best Regards, > > George/倪春恩 > Software Engineer/软件工程师 > Mobile:+86-13501723787| Fax:+8610-56842040 > 北京明略软件系统有限公司(www.mininglamp.com) > 北京市昌平区东小口镇中东路398号中煤建设集团大厦1号楼4层 > F4,1#,Zhongmei Construction Group Plaza,398# Zhongdong Road,Changping > District,Beijing,102218 > > ---------------------------------------------------------------------------------------------------------------------------- > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
