Zhangyx39 opened a new pull request #1426: URL: https://github.com/apache/samza/pull/1426
Symptom: The user provided serde failed to deserialize a message. Then, IntermediateMessageSerde tried to deserialize the message for the second time, which caused OOM and container died. Direct cause: The user provided serde would construct an array based on the encoded array size. Given wrong size, the serde constructed a huge array and caused OOM. Root cause: In samza 0.13.1, we added a byte to the head of the payload. The byte represents the message type (event|watermark|EOS). During deserialization, IntermediateMessageSerde will read the first byte, then deserialize the message according to the message byte. For compatibility, if it fails to read the message type, it will try to deserialize again with all bytes (including the first byte). More details in this PR: https://github.com/apache/samza/pull/207 Changes: We should remove the second try. This will make upgrades from 0.13 to master to fail. Workaround is upgrading to 1.4/1.5 instead or resetting the checkpoint of intermediate topic to newest. Tests: Added unit test: TestIntermediateMessageSerde.testUserMessageSerdeException() ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org