shunping commented on issue #36387: URL: https://github.com/apache/beam/issues/36387#issuecomment-3373054961
After digging in the code, it seems that when we convert the TestStream's element events from/to pipeline proto, we call `CoderUtils.encodeToByteArray` and `CoderUtils.decodeFromByteArray`. https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/util/construction/TestStreamTranslation.java#L124-L126 https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/util/construction/TestStreamTranslation.java#L150-L153. If called without a context, `CoderUtils.encodeToByteArray`/`CoderUtils.decodeFromByteArray` would use context outer. https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/util/CoderUtils.java#L96 https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/util/CoderUtils.java#L116 In such context, `StringUtf8Coder` will not have length prefix when encoding and decoding the data. https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/StringUtf8Coder.java#L76-L82 https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/coders/StringUtf8Coder.java#L95-L98 This works if there is not fusion break between TestStream and its following stages. However, for portable runner like Prism, there could be a fusion break after TestStream. If so, the TestStream values encoded in the pipeline proto is generated with outer context, while the downstream step use nested context to decode the buffer. https://github.com/apache/beam/blob/09aa10c52f1d24846ab30e791c3e5bd544e9321f/sdks/java/core/src/main/java/org/apache/beam/sdk/values/WindowedValues.java#L838 This leads to the stacktrace we see in the description. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
