Hi Martin
> I suppose this is only about the Parquet Writer/Reader implementation, not 
> about changes to the Parquet specification.
[Cheng's comments] Yes, we don't need to change the specification (Parquet 
format) unless we want to introduce a new compression codec. More often, 
customers will extend or replace built-in codec with their own. So it's codec 
level changes used by Parquet reader/writer.

> I would like to know whether offloading the task of compressing/decompressing 
> some data is really beneficial performance wise.
[Cheng's comments] There're two beneficial points expected to have from 
accelerators: 1) CPU offloading 2) Better performance. For the second one, you 
can check the following talk with detailed performance number mentioned.
https://databricks.com/session_eu19/accelerating-apache-spark-with-intel-quickassist-technology
 

>- The accelerator might require to have the compressed data copied over to 
>decompress it. This will only make compression/decompression slower since many 
>of the supported codecs actually have quite fast parsers and decompressors. 
>The accelerator would have to copy it back.
>- Even if it doesn't have to be copied over, I suppose this accelerator is 
>connected over the PCI-E bus so reading chunks would be expensive. Also, many 
>of those decompressors reference chunks observed previously and perform a 
>memcpy. The accelerator implementation has to be smart about those things.
>- Many of the decompressors do some decoding and essentially perform a memcpy 
>which makes them quite fast.

[Cheng's comments] Yes, there're several ways to address this like implementing 
a DMA engine in FPGA or via shared memory.

>- Can the supported codecs like zstd, lz4, etc run on those accelerators?
[Cheng's comments] Yes.

>Have you done some measurements?
[Cheng's comments] Please see the slides above as a reference.

Thanks
Cheng Xu

-----Original Message-----
From: Radev, Martin <[email protected]> 
Sent: Wednesday, March 4, 2020 6:02 PM
To: [email protected]
Subject: Re: Provide pluggable APIs to support user customized compression codec

Hi Xin,


thanks for the interest in extending Parquet. I suppose this is only about the 
Parquet Writer/Reader implementation, not about changes to the Parquet 
specification.

I would like to know whether offloading the task of compressing/decompressing 
some data is really beneficial performance wise.

I suppose I don't understand how all of this would come together. Here are my 
points:

- The accelerator might require to have the compressed data copied over to 
decompress it. This will only make compression/decompression slower since many 
of the supported codecs actually have quite fast parsers and decompressors. The 
accelerator would have to copy it back.

- Even if it doesn't have to be copied over, I suppose this accelerator is 
connected over the PCI-E bus so reading chunks would be expensive. Also, many 
of those decompressors reference chunks observed previously and perform a 
memcpy. The accelerator implementation has to be smart about those things.
- Many of the decompressors do some decoding and essentially perform a memcpy 
which makes them quite fast.

- Can the supported codecs like zstd, lz4, etc run on those accelerators?

Have you done some measurements?


Kind regards,

Martin



________________________________
From: Dong, Xin <[email protected]>
Sent: Wednesday, March 4, 2020 1:46:29 AM
To: [email protected]
Subject: Provide pluggable APIs to support user customized compression codec

Hi,
In demand of better performance, quite some end users want to leverage 
accelerators (e.g. FPGA, Intel QAT) to offload compression computation. 
However, in current parquet-mr code, codec implementation can't be customized 
to leverage accelerators. We would like to proposal a pluggable API to support 
the customized compression codec.
I've opened a JIRA https://issues.apache.org/jira/browse/PARQUET-1804 for this 
issue. What's your throughts on this issue?
Best Regards,
Xin Dong

Reply via email to