RE: Proposal to integrate QATCodec into Carbondata

Xu, Cheng A Thu, 01 Nov 2018 06:24:02 -0700

Hi Jacky
The repo address is at https://github.com/intel-hadoop/IntelQATCodec open 
source with apache license. Regards to the hardware dependencies and its 
performance, it needs the extra QAT device [1] to have the hardware 
acceleration and will fall back to SW based GZip implementation.
Its performance has been certificated by Cloudera [2].


[1] 
https://www.intel.cn/content/www/cn/zh/architecture-and-technology/intel-quick-assist-technology-overview.html
 
[2] https://www.cloudera.com/partners/partners-listing.html?q=intel 

Thanks
Ferdinand Xu


-----Original Message-----
From: Jacky Li [mailto:jacky.li...@qq.com] 
Sent: Thursday, November 1, 2018 8:13 PM
To: dev@carbondata.apache.org
Subject: Re: Proposal to integrate QATCodec into Carbondata

Hi,

Good to know about QATCodec. I have a quick question. Is the QATCodec an 
independent compression/decompression library or it depends on any hardware to 
achieve the performance improvement you have mentioned?

Is there any link for QATCodec project or source code?

Regards,
Jacky 

> 在 2018年10月12日，上午10:40，Xu, Cheng A <cheng.a...@intel.com> 写道：
> 
> Hi all
> I want to make a proposal to support QATCodec [1] into CarbonData. QAT Codec 
> project provides compression and decompression library for Apache 
> Hadoop/Spark to make use of the Intel(r) QuickAssist Technology (Abbrev. QAT) 
> [2] for compression/decompression. This project has been open source this 
> year as well as the underlying native dependencies - QATZip. And users can 
> install the underlying native dependencies using linux package-management 
> utility (e.g. Yum for Centos). This projects have two major benefits:
> 1) A wide ecosystem support
> Now it supports Hadoop & Spark directly by implementing Hadoop & Spark 
> de/compression API and also provides patches to integrate with Parquet and 
> ORC-Hive.
> 2) High performance and space efficiency We measured the performance 
> and compression ratio of QATCodec in different workloads comparing against 
> Snappy.
> For the sort workload (input, intermediate data, output are all 
> compression-enabled, 3TB data scale, 5 workers, 2 replica for data) with Map 
> Reduce, using QATCodec brings 7.29% performance gain and 7.5% better 
> compression ratio. For the sort workload (input and intermediate data are 
> compression-enabled, 3TB data scale) with Spark, it brings 14.3% performance 
> gain, 7.5% better compression ratio. Also we measured in Hive on MR with 
> TPCx-BB workload [3] (3TB data scale), it brings 12.98% performance gain, 
> 13.65% better compression ratio.
> Regards to the hardware requirement, current implementation supports 
> falling-back mechanism to software implementation at the absent of QAT device.
> Now Carbondata supports two compression codec: Zstd and Snappy. I think it 
> will bring the benefit to the users to have an extra compression option with 
> hardware acceleration.
> 
> Please feel free to share your comments on this proposal.
> 
> 
> [1] https://github.com/intel-hadoop/IntelQATCodec
> [2] https://01.org/zh/intel-quickassist-technology
> [3] http://www.tpc.org/tpcx-bb/default.asp
> 
> Best Regards
> Ferdinand Xu
> 
>

RE: Proposal to integrate QATCodec into Carbondata

Reply via email to