[
https://issues.apache.org/jira/browse/PARQUET-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16934945#comment-16934945
]
Gidon Gershinsky commented on PARQUET-1659:
-------------------------------------------
A number of comments
# The idea behind AES_GCM_CTR is that in Parquet files the metada size is
negligible compared to the data size. E.g., the default page size is around
megabyte, and the page header is say a hundred bytes- meaning the metadata is
around 0.01% of data. When running with old Java (with AES-NI acceleration),
GCM is indeed slower than CTR, but applying it on 0.01% of the file shouldn't
be noticeable.
# Basic math shows that getting 10% speed up means that in your files the
metadata part is around a few percents, instead of 0.01%. Would be good to
analyze the reasons (small pages? or small files where the footer size becomes
a considerable part of the file size? anything else?) - this is easy to
measure.
# Java 9 and later run AES in hardware, so its possible to get full data
protection (encryption and integrity guarantees) via GCM without noticeable
change in throughput.
# Introduction of a third algorithm means additions in the spec and in the
thrift and could require another round of PMC vote. We might want to proceed
with the parquet-2.7.0 release as is, and in parallel investigate the reasons
you get 10% speed up (item 2). If we decide to add a pure CTR algo, it can be a
part of say parquet-2.7.1.
> Add AES-CTR to Parquet Encryption
> ----------------------------------
>
> Key: PARQUET-1659
> URL: https://issues.apache.org/jira/browse/PARQUET-1659
> Project: Parquet
> Issue Type: Improvement
> Components: parquet-cpp, parquet-format, parquet-mr
> Affects Versions: format-2.6.0
> Reporter: Xinli Shang
> Priority: Minor
> Labels: pull-request-available
>
> AES-GCM-CTR perform GCM encryption on metadata and CTR encryption on data.
> AES-CTR would perform CTR encryption on both.
> During Perf testing, we found AES-CTR can improve read/write performance by
> ~10% comparing with AES-GCM-CTR.
>
> I checked with Gidon and the initial assumption was that AES-GCM-CTR would
> have similar performance as AES-CTR. But with recent performance
> benchmarking, we found it is worthy to introduce AES-CTR. Since many
> companies strive for parquet performance improvement.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)