UENISHI Kota created HDDS-5461:
----------------------------------
Summary: Move old objects to delete table on overwrite
Key: HDDS-5461
URL: https://issues.apache.org/jira/browse/HDDS-5461
Project: Apache Ozone
Issue Type: Improvement
Components: OM
Affects Versions: 1.1.0
Reporter: UENISHI Kota
Assignee: UENISHI Kota
Fix For: 1.2.0
HDDS-5243 was a patch for omitting key locations for clients on reading. But
the same warning of large response size observed in our cluster for putting
data. This is harmful in terms of retry storm, as hadoop-rpc handles this
large-response-exception as retry-able exception. Thus, RetryInvocationHandler
retries, despite it cannot be recovered by retry, for 15 times, receiving large
response message exceeding default limit of RPC message size 128MB as follows.
{quote}
2021-06-21 19:23:10,717 [IPC Server handler 65 on default port 9862] WARN
org.apache.hadoop.ipc.Server: Large response size 134538349 for call
Call#2037538 Retry#15
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from
10.192.17.172:34070
2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] WARN
org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862, call
Call#2037538 Retry#15
org.apache.hadoop.ozone.om.protocol.OzoneManagerProtocol.submitRequest from
10.192.17.172:34070: output error
2021-06-21 19:23:10,722 [IPC Server handler 65 on default port 9862] INFO
org.apache.hadoop.ipc.Server: IPC Server handler 65 on default port 9862 caught
an exception
java.nio.channels.AsynchronousCloseException
at
java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:205)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:478)
at org.apache.hadoop.ipc.Server.channelIO(Server.java:3642)
at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3594)
at org.apache.hadoop.ipc.Server.access$1700(Server.java:139)
at
org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1657)
at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1727)
{quote}
Suggestion in HDDS-5393 was wrong and it shall be fixed by making old blocks
eligible for deletion service, moving to deletion table. It is only needed for
normal object-put, while not needed for MultipartUpload objects, if I
understand correctly.
Keeping old blocks and key locations after overwrite might be intended for
supporting [object versioning
API|https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectVersions.html],
but IMO current design will not scale more than, say, thousands of objects.
The order of the size of value in key table will be in O( n(version) *
n(blocks) ), which might easily exceed current limit of RPC message (128MB by
default) or intended value size in RocksDB. Although current implementation is
effective for concurrency control, object versioning should be implemented in
some different way.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]