sean created PARQUET-2146:
-----------------------------
Summary: AvroParquetWriter write to s3 bucket throws data
intergrity exception
Key: PARQUET-2146
URL: https://issues.apache.org/jira/browse/PARQUET-2146
Project: Parquet
Issue Type: Bug
Affects Versions: 1.12.2
Reporter: sean
Hi, we are trying to use
[org.apache.parquet.avro|https://www.tabnine.com/code/java/packages/org.apache.parquet.avro].AvroParquetWriter
to write parquet file to s3 bucket. The file is successfully written to s3
bucket but
get an exception
com.amazonaws.SdkClientException: Unable to verify integrity of data upload.
The purpose is to resolve this exceptions while The s3 bucket is encrypted
with SSE-KMS not SSE-S3.
It appears that the exceptions are thrown because of code blocks in the link
below
[https://github.com/aws/aws-sdk-java/blob/fd409dee8ae23fb8953e0bb4dbde65536a7e0514/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1876]
>From amazon doc, the etag is not same as MD5 when s3 bucket is encrypted with
>SSE-KMS
[https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]
*The possible way is to pass MD5 in request header or set system.property to
disable validation in
skipMd5CheckStrategy.skipClientSideValidationPerPutResponse as indicated in
link*
[https://github.com/aws/aws-sdk-java/blob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42]
The issue is that I do not find a proper way to inject such configurations
into AvroParquetWriter. Is this possible? If yes, can you help to show how to
do it?
Thanks
Sean
--
This message was sent by Atlassian Jira
(v8.20.7#820007)