Ivan Andika created HDDS-15228:
----------------------------------

             Summary: KeyDeletingService limit batch deletions based on number 
of blocks
                 Key: HDDS-15228
                 URL: https://issues.apache.org/jira/browse/HDDS-15228
             Project: Apache Ozone
          Issue Type: Improvement
            Reporter: Ivan Andika


We encountered  the following issue.

{code:java}
com.google.protobuf.ServiceException: java.lang.NegativeArraySizeException: 
-1273201896, while invoking $Proxy34.send over 
nodeId=scm4,nodeAddress=<redacted> after 256 failover attempts. Trying to 
failover after sleeping for 2000ms.
{code}

Currently, KeyDeletingService would send deletions based on the number of keys 
(ozone.key.deleting.limit.per.task). However, some keys can have a large number 
of blocksĀ  especially keys with EC where one block is assigned per shard (e.g. 
EC 6+3 will have 9 different BlockID per KeyLocationInfo compare to RATIS/THREE 
only have 1 BlockID).

This can cause issues where a large SCM deleteKeyBlocks response causes Integer 
overflow which triggers java.lang.NegativeArraySizeException. Even when we set 
the ipc.maximum.data.length (512MB) and ipc.maximum.response.length (640MB) to 
higher value, it seems to still trigger the issue.

To prevent this, we can batch the deletions based on the number of blocks. 
However, we need ensure that at least a single key is sent to deletion (even if 
breaches the number of blocks) so that the OM deletion still proceeds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to