sean created PARQUET-2146:
-----------------------------

             Summary: AvroParquetWriter  write to s3 bucket throws data 
intergrity exception 
                 Key: PARQUET-2146
                 URL: https://issues.apache.org/jira/browse/PARQUET-2146
             Project: Parquet
          Issue Type: Bug
    Affects Versions: 1.12.2
            Reporter: sean


 

Hi, we are trying to use 
[org.apache.parquet.avro|https://www.tabnine.com/code/java/packages/org.apache.parquet.avro].AvroParquetWriter

to write parquet file to s3 bucket. The file is successfully written to s3 
bucket but 

get an exception

com.amazonaws.SdkClientException: Unable to verify integrity of data upload.

The purpose is to resolve this exceptions while  The s3 bucket is encrypted 
with SSE-KMS not SSE-S3. 

 

It appears that the exceptions are thrown because of code blocks in the link 
below

[https://github.com/aws/aws-sdk-java/blob/fd409dee8ae23fb8953e0bb4dbde65536a7e0514/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1876]

>From amazon doc, the etag is not same as MD5 when s3 bucket is encrypted with 
>SSE-KMS

[https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]

 

*The possible way is to pass MD5 in request header or set system.property to 
disable validation in  
skipMd5CheckStrategy.skipClientSideValidationPerPutResponse as indicated in 
link*

[https://github.com/aws/aws-sdk-java/blob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42]

 The issue is that I do not find a proper way to inject such configurations 
into AvroParquetWriter. Is this possible? If yes, can you help to show how to 
do it? 

 

Thanks

 

Sean

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to