[
https://issues.apache.org/jira/browse/PARQUET-2146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850469#comment-17850469
]
Steve Loughran commented on PARQUET-2146:
-----------------------------------------
This error isn't related to server side encryption, which other than the etags
is generally invisible. And etags are way more complicated than md5s everywhere.
What I believe it means is that the client application uploaded a block and the
validation logic said "not valid"
* which s3 connector? EMR s3:, hadoop s3a or other?
* whose s3 store? AWS s3 or something else.
This isn't parquet's problem, it's that of whoever wrote the s3 connector. And
if it is the hadoop one, while you've got the right JIRA server, our response
will be one of "does it still happen on the 3.4.x or 3.3.6 releases?".
Looking at the v1 SDK there doesn't seem any way to disable this checking
programatically, though you can disable checksum validation on read and write
through system properties.
{code}
com.amazonaws.services.s3.disableGetObjectMD5Validation
com.amazonaws.services.s3.disablePutObjectMD5Validation
{code}
Please tell us more. If it's through the s3a connector then move to becoming a
HADOOP JIRA.
If it is someone else's it'll have to be a WONTFIX
> AvroParquetWriter write to s3 bucket throws data intergrity exception
> -----------------------------------------------------------------------
>
> Key: PARQUET-2146
> URL: https://issues.apache.org/jira/browse/PARQUET-2146
> Project: Parquet
> Issue Type: Bug
> Affects Versions: 1.12.2
> Reporter: sean
> Priority: Major
>
>
> Hi, we are trying to use
> [org.apache.parquet.avro|https://www.tabnine.com/code/java/packages/org.apache.parquet.avro].AvroParquetWriter
> to write parquet file to s3 bucket. The file is successfully written to s3
> bucket but
> get an exception
> com.amazonaws.SdkClientException: Unable to verify integrity of data upload.
> The purpose is to resolve this exceptions while The s3 bucket is encrypted
> with SSE-KMS not SSE-S3.
>
> It appears that the exceptions are thrown because of code blocks in the link
> below
> [https://github.com/aws/aws-sdk-java/blob/fd409dee8ae23fb8953e0bb4dbde65536a7e0514/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1876]
> From amazon doc, the etag is not same as MD5 when s3 bucket is encrypted with
> SSE-KMS
> [https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonResponseHeaders.html]
>
> *The possible way is to pass MD5 in request header or set system.property to
> disable validation in
> skipMd5CheckStrategy.skipClientSideValidationPerPutResponse as indicated in
> link*
> [https://github.com/aws/aws-sdk-java/blob/99fe75a823d4b02f4e90fa0dda06a1558d5617a1/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/SkipMd5CheckStrategy.java#L42]
> The issue is that I do not find a proper way to inject such configurations
> into AvroParquetWriter. Is this possible? If yes, can you help to show how to
> do it?
>
> Thanks
>
> Sean
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]