UENISHI Kota created HDDS-5461:
----------------------------------

             Summary: Move old objects to delete table on overwrite
                 Key: HDDS-5461
                 URL: https://issues.apache.org/jira/browse/HDDS-5461
             Project: Apache Ozone
          Issue Type: Improvement
          Components: OM
    Affects Versions: 1.1.0
            Reporter: UENISHI Kota
            Assignee: UENISHI Kota
             Fix For: 1.2.0


HDDS-5243 was a patch for omitting key locations for clients on reading. But 
the same warning of large response size observed in our cluster for putting 
data. This is harmful in terms of retry storm, as hadoop-rpc handles this 
large-response-exception as retry-able exception. Thus, RetryInvocationHandler 
retries, despite it cannot be recovered by retry, for 15 times, receiving large 
response message exceeding default limit of RPC message size 128MB as follows.

{quote}
2021-06-21 19:23:10,717 [IPC Server handler 65 on default port 9862] WARN 
org.apache.hadoop.ipc.Server: Large response size 134538349 for call 
Call#2037538 Retry#15 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
10.192.17.172:34070
2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] WARN 
org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862, call 
Call#2037538 Retry#15 
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from 
10.192.17.172:34070: output error
2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] INFO 
org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862 caught 
an exception
java.nio.channels.AsynchronousCloseException
        at 
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
        at org.apache.hadoop.ipc.Server.channelIO(Server.java:3642)
        at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3594)
        at org.apache.hadoop.ipc.Server.access$1700(Server.java:139)
        at 
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1657)
        at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1727)
{quote}

Suggestion in HDDS-5393 was wrong and it shall be fixed by making old blocks 
eligible for deletion service, moving to deletion table. It is only needed for 
normal object-put, while not needed for MultipartUpload objects, if I 
understand correctly. 

Keeping old blocks and key locations after overwrite might be intended for 
supporting [object versioning 
API|https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html],
 but IMO current design will not scale more than, say, thousands of objects. 
The order of the size of value in key table will be in O( n(version) * 
n(blocks) ), which might easily exceed current limit of RPC message (128MB by 
default) or intended value size in RocksDB. Although current implementation is 
effective for concurrency control, object versioning should be implemented in 
some different way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to