Wei-Chiu Chuang created HDDS-10853:
--------------------------------------

             Summary: Snapshot incompatible protobuf changes
                 Key: HDDS-10853
                 URL: https://issues.apache.org/jira/browse/HDDS-10853
             Project: Apache Ozone
          Issue Type: Bug
    Affects Versions: 1.4.0
            Reporter: Wei-Chiu Chuang


I noticed two incompatible changes in protobuf fields introduced by HDDS-7509 
and HDDS-7952.

 

HDDS-7509 changed the SnapshotInfo fields snapshotID, pathPreviousSnapshotID 
and globalPreviousSnapshotID from string to UUID.

HDDS-7952 overhauled the snapshot diff job db.

 

Sharing the error stack traces for posterity:

HDDS-7509
{noformat}
2024-05-10 21:52:48,805 ERROR 
[main]-org.apache.hadoop.ozone.om.OzoneManagerStarter: OM start failed with 
exception
java.lang.IllegalStateException: Failed next()
        at 
org.apache.hadoop.hdds.utils.db.TypedTable$RawIterator.next(TypedTable.java:670)
        at 
org.apache.hadoop.hdds.utils.db.TypedTable$RawIterator.next(TypedTable.java:619)
        at 
org.apache.hadoop.ozone.om.SnapshotChainManager.loadFromSnapshotInfoTable(SnapshotChainManager.java:295)
        at 
org.apache.hadoop.ozone.om.SnapshotChainManager.<init>(SnapshotChainManager.java:66)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.start(OmMetadataManagerImpl.java:558)
        at 
org.apache.hadoop.ozone.om.OmMetadataManagerImpl.<init>(OmMetadataManagerImpl.java:335)
        at 
org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:794)
        at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:674)
        at 
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:759)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
        at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
Caused by: com.google.protobuf.InvalidProtocolBufferException: While parsing a 
protocol message, the input ended unexpectedly in the middle of a field.  This 
could mean either than the input has been truncated or that an embedded message 
misreported its own length.
        at 
com.google.protobuf.InvalidProtocolBufferException.truncatedMessage(InvalidProtocolBufferException.java:70)
        at 
com.google.protobuf.CodedInputStream.readRawBytes(CodedInputStream.java:789)
        at 
com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:329)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:484)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:461)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:579)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:280)
        at 
com.google.protobuf.CodedInputStream.readGroup(CodedInputStream.java:240)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:488)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:461)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:579)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFrom(UnknownFieldSet.java:280)
        at 
com.google.protobuf.CodedInputStream.readGroup(CodedInputStream.java:240)
        at 
com.google.protobuf.UnknownFieldSet$Builder.mergeFieldFrom(UnknownFieldSet.java:488)
        at 
com.google.protobuf.GeneratedMessage.parseUnknownField(GeneratedMessage.java:193)
        at 
org.apache.hadoop.hdds.protocol.proto.HddsProtos$UUID.<init>(HddsProtos.java:1253)
        at 
org.apache.hadoop.hdds.protocol.proto.HddsProtos$UUID.<init>(HddsProtos.java:1211)
        at 
org.apache.hadoop.hdds.protocol.proto.HddsProtos$UUID$1.parsePartialFrom(HddsProtos.java:1299)
        at 
org.apache.hadoop.hdds.protocol.proto.HddsProtos$UUID$1.parsePartialFrom(HddsProtos.java:1294)
        at 
com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$SnapshotInfo.<init>(OzoneManagerProtocolProtos.java)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$SnapshotInfo.<init>(OzoneManagerProtocolProtos.java)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$SnapshotInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java)
        at 
org.apache.hadoop.ozone.protocol.proto.OzoneManagerProtocolProtos$SnapshotInfo$1.parsePartialFrom(OzoneManagerProtocolProtos.java)
        at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:200)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:217)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:223)
        at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
        at 
org.apache.hadoop.hdds.utils.db.Proto2Codec.fromCodecBuffer(Proto2Codec.java:89)
        at 
org.apache.hadoop.hdds.utils.db.Proto2Codec.fromCodecBuffer(Proto2Codec.java:35)
        at 
org.apache.hadoop.hdds.utils.db.DelegatedCodec.fromCodecBuffer(DelegatedCodec.java:91)
        at 
org.apache.hadoop.hdds.utils.db.TypedTable$1.convert(TypedTable.java:587)
        at 
org.apache.hadoop.hdds.utils.db.TypedTable$RawIterator.next(TypedTable.java:668)
        ... 22 more
 {noformat}
 

HDDS-7952
{noformat}
OM start failed with exception
java.lang.RuntimeException: com.fasterxml.jackson.core.JsonParseException: 
Unexpected character ('-' (code 45)): Expected space separating root-level 
values
 at [Source: (byte[])"128966e2-ebe8-4ff1-88c2-a3b637da626c"; line: 1, column: 
10]
        at 
org.apache.hadoop.ozone.om.snapshot.RocksDbPersistentMap$1.next(RocksDbPersistentMap.java:146)
        at 
org.apache.hadoop.ozone.om.snapshot.RocksDbPersistentMap$1.next(RocksDbPersistentMap.java:1)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.loadJobsOnStartUp(SnapshotDiffManager.java:1627)
        at 
org.apache.hadoop.ozone.om.snapshot.SnapshotDiffManager.<init>(SnapshotDiffManager.java:281)
        at 
org.apache.hadoop.ozone.om.OmSnapshotManager.<init>(OmSnapshotManager.java:278)
        at 
org.apache.hadoop.ozone.om.OzoneManager.instantiateServices(OzoneManager.java:849)
        at org.apache.hadoop.ozone.om.OzoneManager.<init>(OzoneManager.java:676)
        at 
org.apache.hadoop.ozone.om.OzoneManager.createOm(OzoneManager.java:761)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter$OMStarterHelper.start(OzoneManagerStarter.java:189)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.startOm(OzoneManagerStarter.java:86)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.call(OzoneManagerStarter.java:74)
        at org.apache.hadoop.hdds.cli.GenericCli.call(GenericCli.java:38)
        at picocli.CommandLine.executeUserObject(CommandLine.java:1953)
        at picocli.CommandLine.access$1300(CommandLine.java:145)
        at 
picocli.CommandLine$RunLast.executeUserObjectOfLastSubcommandWithSameParent(CommandLine.java:2352)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2346)
        at picocli.CommandLine$RunLast.handle(CommandLine.java:2311)
        at 
picocli.CommandLine$AbstractParseResultHandler.execute(CommandLine.java:2179)
        at picocli.CommandLine.execute(CommandLine.java:2078)
        at org.apache.hadoop.hdds.cli.GenericCli.execute(GenericCli.java:100)
        at org.apache.hadoop.hdds.cli.GenericCli.run(GenericCli.java:91)
        at 
org.apache.hadoop.ozone.om.OzoneManagerStarter.main(OzoneManagerStarter.java:58)
Caused by: com.fasterxml.jackson.core.JsonParseException: Unexpected character 
('-' (code 45)): Expected space separating root-level values
 at [Source: (byte[])"128966e2-ebe8-4ff1-88c2-a3b637da626c"; line: 1, column: 
10]
        at 
com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:2391)
        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:735)
        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportUnexpectedChar(ParserMinimalBase.java:659)
        at 
com.fasterxml.jackson.core.base.ParserMinimalBase._reportMissingRootWS(ParserMinimalBase.java:707)
        at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._verifyRootSpace(UTF8StreamJsonParser.java:1734)
        at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parseFloat(UTF8StreamJsonParser.java:1696)
        at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._parsePosNumber(UTF8StreamJsonParser.java:1467)
        at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser._nextTokenNotInObject(UTF8StreamJsonParser.java:900)
        at 
com.fasterxml.jackson.core.json.UTF8StreamJsonParser.nextToken(UTF8StreamJsonParser.java:794)
        at 
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:4761)
        at 
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4667)
        at 
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3690)
        at 
org.apache.hadoop.ozone.om.helpers.SnapshotDiffJob$SnapshotDiffJobCodec.fromPersistedFormat(SnapshotDiffJob.java:273)
        at 
org.apache.hadoop.ozone.om.helpers.SnapshotDiffJob$SnapshotDiffJobCodec.fromPersistedFormat(SnapshotDiffJob.java:257)
        at 
org.apache.hadoop.hdds.utils.db.CodecRegistry.asObject(CodecRegistry.java:101)
        at 
org.apache.hadoop.ozone.om.snapshot.RocksDbPersistentMap$1.next(RocksDbPersistentMap.java:143)
        ... 21 more
{noformat}

Ozone snapshot was released in Apache Ozone 1.4.0 and both changes were made in 
1.4.0 only. But community members relying on Ozone master branch should watch 
out ( I am aware of a few companies rebasing on master branch).

Made an offline conversion tool to repair the broken OM DB. Will polish it a 
bit and post a PR. Maybe we can have a separate repo for repair tools.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to