ivandika3 opened a new pull request, #6435:
URL: https://github.com/apache/ozone/pull/6435

   ## What changes were proposed in this pull request?
   
   Currently, the MessageDigest instance is a thread local variable (one per 
S3G Jetty thread). MessageDigest requires the call to either 
MessageDigest#digest or MessageDigest#reset to reset the digest.
   
   In normal ObjectEndpoint#put flow, MessageDigest#digest is called after the 
data has been written to the datanodes, before the key is committed. However, 
if an IOException happens (e.g. EOFException due to client cancelling during 
the write), the digest will not be reset and remains in the inconsistent state. 
This will affect the subsequent request that uses the same thread and therefore 
the ETag generated will be completely different from the md5 hash of the object 
causing AWS S3 SDK to detect inconsistent hash when downloading the object.
   
   The issue can be replicated using an S3G with a single thread and doing 
three put-object operations for the same key and same payload. You can set the 
`hadoop.http.max.threads` in `ozone-site.xml` to a small value (e.g. 4) to 
increase the chance of the same thread handling the request.
   
   - 1st put-object: cancel the operation before it put-object operation can 
finish, ensure the EOFException is thrown in the S3Gateway logs
   
   - 2nd put-object: let the put-object finish. The resulting ETag will not be 
the same as the md5 digest of the payload (you might need to do this for a few 
time since the S3G thread might not be the same from the previous call)
   
   - 3rd put-object: also let the put-object finish. Since the previous 
put-object reset the digest, the resulting ETag will be correct. 
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-10587
   
   ## How was this patch tested?
   
   Manual test from Ozone Intellij IDE setup as shown in the description.
   
   Ref: 
https://cwiki.apache.org/confluence/display/OZONE/Run+Ozone+cluster+from+IDE
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to