[
https://issues.apache.org/jira/browse/HDDS-2356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972874#comment-16972874
]
Bharat Viswanadham edited comment on HDDS-2356 at 11/12/19 11:21 PM:
---------------------------------------------------------------------
Hi [~timmylicheng]
Thanks for sharing the logs.
I see an abort multipart upload request for the key plc_1570863541668_9278 once
complete multipart upload failed.
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 |
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test,
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1
5626 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
5627 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
. . 5911 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
5912 ]} | ret=FAILURE | INVALID_PART
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey:
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 |
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test,
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo= []}
{code}
And after that still, allocateBlock is continuing for the key because the entry
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed
only entry which has been created during initiateMPU request, so that is the
reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But
the strange thing I have observed is the clientID is not matching with any of
the name in the partlist, as partName lastpart is clientID.)
And from the OM audit log, I see partNumber 1, and a list of multipart names,
not sure if some log is truncated here. As it should show like part name,
partNumber.
# If you can confirm for this key what are parts in OM, you can get this from
listParts(But this should be done before abort request).
# Check in the OM audit log for this key what is the partlist we get, not sure
in the uploaded log it is truncated.
On my cluster audit logs look like below, where when completeMultipartUpload, I
can see partNumber and partName.(Whereas in the uploaded log, I don't see like
below)
{code:java}
2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE,
keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE,
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS |
2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[blockID {
containerBlockID {
containerID: 6
localID: 103127415126327331
}
blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
leaderID: ""
members {
uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
ipAddress: "10.65.49.251"
hostName: "bh-ozone-3.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
networkLocation: "/default-rack"
}
members {
uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
ipAddress: "10.65.51.23"
hostName: "bh-ozone-4.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
networkLocation: "/default-rack"
}
members {
uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
ipAddress: "10.65.53.160"
hostName: "bh-ozone-2.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
networkLocation: "/default-rack"
}
state: PIPELINE_OPEN
type: RATIS
factor: THREE
id {
id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
}
}
]} | ret=SUCCESS |
2019-11-12 14:57:59,811 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE,
keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:59,819 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE,
keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS |
2019-11-12 14:58:00,016 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[blockID {
containerBlockID {
containerID: 4
localID: 103127415509385252
}
blockCommitSequenceId: 22
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
leaderID: ""
members {
uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
ipAddress: "10.65.49.251"
hostName: "bh-ozone-3.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
networkLocation: "/default-rack"
}
members {
uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
ipAddress: "10.65.51.23"
hostName: "bh-ozone-4.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
networkLocation: "/default-rack"
}
members {
uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
ipAddress: "10.65.53.160"
hostName: "bh-ozone-2.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
networkLocation: "/default-rack"
}
state: PIPELINE_OPEN
type: RATIS
factor: THREE
id {
id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
}
}
]} | ret=SUCCESS |
2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key12, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=FAILURE | MISMATCH_MULTIPART_LIST
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
2019-11-12 14:59:12,951 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=SUCCESS |
{code}
I have tried setting up goofys no luck, I get unable to mount error look sys
logs. (Still not able to know the root cause)
{code:java}
[root@bh-ozone-2 ozone-0.5.0-SNAPSHOT]# ./goofys --endpoint
http://localhost:9878 b12345 /root/s3/
2019/11/12 15:20:26.428553 main.FATAL Unable to mount file system, see syslog
for details
{code}
Any help in resolving this will help.
I am coming up with a freon test for S3MPU, to run the tests.
>From the log I suspect, complete multipart upload request is having wrong
>information that is causing for this error and once after it failed, it has a
>call for abort MPU and finally, when you commit the part, it will say no such
>MultipartUploadError.
was (Author: bharatviswa):
Hi [~timmylicheng]
Thanks for sharing the logs.
I see an abort multipart upload request for the key plc_1570863541668_9278 once
complete multipart upload failed.
{code:java}
2019-11-08 20:08:24,830 | ERROR | OMAudit | user=root | ip=9.134.50.210 |
op=COMPLETE_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test,
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationIn fo=[], multipartList=[partNumber: 1
5626 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374356085"
5627 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102209374487158"
. . 5911 partName:
"/s325d55ad283aa400af464c76d713c07ad/ozone-test/plc_1570863541668_9278103102211984655258"
5912 ]} | ret=FAILURE | INVALID_PART
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s325d55ad283aa400af464c76d713c07adbucket: ozone-testkey:
plc_1570863541668_9278
2019-11-08 20:08:24,963 | INFO | OMAudit | user=root | ip=9.134.50.210 |
op=ABORT_MULTIPART_UPLOAD
{volume=s325d55ad283aa400af464c76d713c07ad, bucket=ozone-test,
key=plc_1570863541668_9278, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo= []}
{code}
And after that still, allocateBlock is continuing for the key because the entry
from openKeyTable is not removed by abortMultipartUpload request.(Abort removed
only entry which has been created during initiateMPU request, so that is the
reason after some time you see the NO_SUCH_MULTIPART_UPLOAD error during
commitMultipartUploadKey, as we removed entry from MultipartInfo table. (But
the strange thing I have observed is the clientID is not matching with any of
the name in the partlist, as partName lastpart is clientID.)
And from the OM audit log, I see partNumber 1, and a list of multipart names,
not sure if some log is truncated here. As it should show like part name,
partNumber.
# If you can confirm for this key what are parts in OM, you can get this from
listParts(But this should be done before abort request).
# Check in the OM audit log for this key what is the partlist we get, not sure
in the uploaded log it is truncated.
On my cluster audit logs look like below, where when completeMultipartUpload, I
can see partNumber and partName.(Whereas in the uploaded log, I don't see like
below)
{code:java}
2019-11-12 14:57:18,580 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=INITIATE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=THREE, keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:53,967 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE,
keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:53,974 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE,
keyLocationInfo=[], clientID=103127415125868581} | ret=SUCCESS |
2019-11-12 14:57:54,154 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[blockID {
containerBlockID {
containerID: 6
localID: 103127415126327331
}
blockCommitSequenceId: 18
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
leaderID: ""
members {
uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
ipAddress: "10.65.49.251"
hostName: "bh-ozone-3.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
networkLocation: "/default-rack"
}
members {
uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
ipAddress: "10.65.51.23"
hostName: "bh-ozone-4.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
networkLocation: "/default-rack"
}
members {
uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
ipAddress: "10.65.53.160"
hostName: "bh-ozone-2.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
networkLocation: "/default-rack"
}
state: PIPELINE_OPEN
type: RATIS
factor: THREE
id {
id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
}
}
]} | ret=SUCCESS |
2019-11-12 14:57:59,811 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_KEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=ONE,
keyLocationInfo=[]} | ret=SUCCESS |
2019-11-12 14:57:59,819 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=ALLOCATE_BLOCK {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4, bucket=b1234,
key=key123, dataSize=5242880, replicationType=RATIS, replicationFactor=THREE,
keyLocationInfo=[], clientID=103127415508860966} | ret=SUCCESS |
2019-11-12 14:58:00,016 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMMIT_MULTIPART_UPLOAD_PARTKEY {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=5242880, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[blockID {
containerBlockID {
containerID: 4
localID: 103127415509385252
}
blockCommitSequenceId: 22
}
offset: 0
length: 5242880
createVersion: 0
pipeline {
leaderID: ""
members {
uuid: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
ipAddress: "10.65.49.251"
hostName: "bh-ozone-3.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "a5fba53c-3aa9-48a7-8272-34c606f93bc6"
networkLocation: "/default-rack"
}
members {
uuid: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
ipAddress: "10.65.51.23"
hostName: "bh-ozone-4.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "5e2625aa-637e-4e5a-a0a1-6683bd108b0d"
networkLocation: "/default-rack"
}
members {
uuid: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
ipAddress: "10.65.53.160"
hostName: "bh-ozone-2.vpc.cloudera.com"
ports {
name: "RATIS"
value: 9858
}
ports {
name: "STANDALONE"
value: 9859
}
networkName: "cf8aace1-92b8-496e-aed9-f2771c83a56b"
networkLocation: "/default-rack"
}
state: PIPELINE_OPEN
type: RATIS
factor: THREE
id {
id: "99954bc5-a77a-4546-87b4-a45b89d6ecbf"
}
}
]} | ret=SUCCESS |
2019-11-12 14:58:39,710 | ERROR | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key12, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=FAILURE | MISMATCH_MULTIPART_LIST
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key12
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:195)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
2019-11-12 14:58:49,503 | ERROR | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=FAILURE | NO_SUCH_MULTIPART_UPLOAD_ERROR
org.apache.hadoop.ozone.om.exceptions.OMException: Complete Multipart Upload
Failed: volume: s3dfb57b2e5f36c1f893dbc12dd66897d4bucket: b1234key: key123
at
org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCompleteRequest.validateAndUpdateCache(S3MultipartUploadCompleteRequest.java:142)
at
org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.java:217)
2019-11-12 14:59:12,951 | INFO | OMAudit | user=root | ip=10.65.53.160 |
op=COMPLETE_MULTIPART_UPLOAD {volume=s3dfb57b2e5f36c1f893dbc12dd66897d4,
bucket=b1234, key=key123, dataSize=0, replicationType=RATIS,
replicationFactor=ONE, keyLocationInfo=[],
multipartList={1=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415125868581,
2=/s3dfb57b2e5f36c1f893dbc12dd66897d4/b1234/key123103127415508860966}} |
ret=SUCCESS |
{code}
I have tried setting up goofys no luck, I get unable to mount error look sys
logs. (Still not able to know the root cause)
I am coming up with a freon test for S3MPU, to run the tests.
>From the log I suspect, complete multipart upload request is having wrong
>information that is causing for this error and once after it failed, it has a
>call for abort MPU and finally, when you commit the part, it will say no such
>MultipartUploadError.
> Multipart upload report errors while writing to ozone Ratis pipeline
> --------------------------------------------------------------------
>
> Key: HDDS-2356
> URL: https://issues.apache.org/jira/browse/HDDS-2356
> Project: Hadoop Distributed Data Store
> Issue Type: Bug
> Components: Ozone Manager
> Affects Versions: 0.4.1
> Environment: Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM
> on a separate VM
> Reporter: Li Cheng
> Assignee: Bharat Viswanadham
> Priority: Blocker
> Fix For: 0.5.0
>
> Attachments: 2019-11-06_18_13_57_422_ERROR, hs_err_pid9340.log,
> image-2019-10-31-18-56-56-177.png, om_audit_log_plc_1570863541668_9278.txt
>
>
> Env: 4 VMs in total: 3 Datanodes on 3 VMs, 1 OM & 1 SCM on a separate VM, say
> it's VM0.
> I use goofys as a fuse and enable ozone S3 gateway to mount ozone to a path
> on VM0, while reading data from VM0 local disk and write to mount path. The
> dataset has various sizes of files from 0 byte to GB-level and it has a
> number of ~50,000 files.
> The writing is slow (1GB for ~10 mins) and it stops after around 4GB. As I
> look at hadoop-root-om-VM_50_210_centos.out log, I see OM throwing errors
> related with Multipart upload. This error eventually causes the writing to
> terminate and OM to be closed.
>
> Updated on 11/06/2019:
> See new multipart upload error NO_SUCH_MULTIPART_UPLOAD_ERROR and full logs
> are in the attachment.
> 2019-11-05 18:12:37,766 ERROR
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest:
> MultipartUpload Commit is failed for Key:./2
> 0191012/plc_1570863541668_9278 in Volume/Bucket
> s325d55ad283aa400af464c76d713c07ad/ozone-test
> NO_SUCH_MULTIPART_UPLOAD_ERROR
> org.apache.hadoop.ozone.om.exceptions.OMException: No such Multipart upload
> is with specified uploadId fcda8608-b431-48b7-8386-
> 0a332f1a709a-103084683261641950
> at
> org.apache.hadoop.ozone.om.request.s3.multipart.S3MultipartUploadCommitPartRequest.validateAndUpdateCache(S3MultipartUploadCommitPartRequest.java:1
> 56)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequestDirectlyToOM(OzoneManagerProtocolServerSideTranslatorPB.
> java:217)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.processRequest(OzoneManagerProtocolServerSideTranslatorPB.java:132)
> at
> org.apache.hadoop.hdds.server.OzoneProtocolMessageDispatcher.processRequest(OzoneProtocolMessageDispatcher.java:72)
> at
> org.apache.hadoop.ozone.protocolPB.OzoneManagerProtocolServerSideTranslatorPB.submitRequest(OzoneManagerProtocolServerSideTranslatorPB.java:100)
> at
> org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$OzoneManagerService$2.callBlockingMethod(OzoneManagerProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:524)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1025)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:822)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2682)
>
> Updated on 10/28/2019:
> See MISMATCH_MULTIPART_LIST error.
>
> 2019-10-28 11:44:34,079 [qtp1383524016-70] ERROR - Error in Complete
> Multipart Upload Request for bucket: ozone-test, key:
> 20191012/plc_1570863541668_927
> 8
> MISMATCH_MULTIPART_LIST org.apache.hadoop.ozone.om.exceptions.OMException:
> Complete Multipart Upload Failed: volume:
> s3c89e813c80ffcea9543004d57b2a1239bucket:
> ozone-testkey: 20191012/plc_1570863541668_9278
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.handleError(OzoneManagerProtocolClientSideTranslatorPB.java:732)
> at
> org.apache.hadoop.ozone.om.protocolPB.OzoneManagerProtocolClientSideTranslatorPB.completeMultipartUpload(OzoneManagerProtocolClientSideTranslatorPB
> .java:1104)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.apache.hadoop.hdds.tracing.TraceAllMethod.invoke(TraceAllMethod.java:66)
> at com.sun.proxy.$Proxy82.completeMultipartUpload(Unknown Source)
> at
> org.apache.hadoop.ozone.client.rpc.RpcClient.completeMultipartUpload(RpcClient.java:883)
> at
> org.apache.hadoop.ozone.client.OzoneBucket.completeMultipartUpload(OzoneBucket.java:445)
> at
> org.apache.hadoop.ozone.s3.endpoint.ObjectEndpoint.completeMultipartUpload(ObjectEndpoint.java:498)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
> at
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
> at
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
> at
> org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
>
> The following errors has been resolved in
> https://issues.apache.org/jira/browse/HDDS-2322.
> 2019-10-24 16:01:59,527 [OMDoubleBufferFlushThread] ERROR - Terminating with
> exit status 2: OMDoubleBuffer flush
> threadOMDoubleBufferFlushThreadencountered Throwable error
> java.util.ConcurrentModificationException
> at java.util.TreeMap.forEach(TreeMap.java:1004)
> at
> org.apache.hadoop.ozone.om.helpers.OmMultipartKeyInfo.getProto(OmMultipartKeyInfo.java:111)
> at
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:38)
> at
> org.apache.hadoop.ozone.om.codec.OmMultipartKeyInfoCodec.toPersistedFormat(OmMultipartKeyInfoCodec.java:31)
> at
> org.apache.hadoop.hdds.utils.db.CodecRegistry.asRawData(CodecRegistry.java:68)
> at
> org.apache.hadoop.hdds.utils.db.TypedTable.putWithBatch(TypedTable.java:125)
> at
> org.apache.hadoop.ozone.om.response.s3.multipart.S3MultipartUploadCommitPartResponse.addToDBBatch(S3MultipartUploadCommitPartResponse.java:112)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.lambda$flushTransactions$0(OzoneManagerDoubleBuffer.java:137)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at
> org.apache.hadoop.ozone.om.ratis.OzoneManagerDoubleBuffer.flushTransactions(OzoneManagerDoubleBuffer.java:135)
> at java.lang.Thread.run(Thread.java:745)
> 2019-10-24 16:01:59,629 [shutdown-hook-0] INFO - SHUTDOWN_MSG:
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]