ivandika3 commented on PR #7419:
URL: https://github.com/apache/ozone/pull/7419#issuecomment-2467650696

   @vtutrinov Thanks for the patch. 
   
   While I understand the reasoning behind this change to reduce the number of 
OM calls, I'm not sure about this caching solution. While single S3G will work 
correctly, the consistency guarantee breaks quickly if there are concurrent 
requests to multiple S3Gs (e.g. a reverse proxy point to multiple S3Gs). That's 
why S3G should be as stateless as possible.
   
   For example, there are two S3Gs (s3g1 and s3g2) for a single "key1". If a 
request download the "key1" through s3g1, it will be cached in s3g1. 
Afterwards, another request upload "key1" with a new body through s3g2. The 
cache in s3g1 will not be invalidated and a download request to s3g1 will still 
return the stale key content. This means that there are two versions of the 
same object until the cache entry expires. Even worse, the data of the 
overwritten might already be deleted while it's still in the S3G cache 
(provided the cache configuration is very large and the underlying blocks have 
been deleted).
   
   IMO, a possible implementation would require some kind of cache coherence 
implementation (witness) similar to the one shown 
https://www.allthingsdistributed.com/2021/04/s3-strong-consistency.html
   
   I think we can also revisit https://github.com/apache/ozone/pull/5565 to 
implement it on the OM-side  so that it only returns the 
`List<OmKeyInfoLocation>` belonging to the particular part and reduce the 
message size for large MPU.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to