[
https://issues.apache.org/jira/browse/FLINK-22729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stephan Ewen updated FLINK-22729:
---------------------------------
Comment: was deleted
(was: I am the [Flink Jira Bot|https://github.com/apache/flink-jira-bot/] and I
help the community manage its development. I see this issues has been marked as
Major but is unassigned and neither itself nor its Sub-Tasks have been updated
for 30 days. I have gone ahead and added a "stale-major" to the issue". If this
ticket is a Major, please either assign yourself or give an update. Afterwards,
please remove the label or in 7 days the issue will be deprioritized.
)
> Truncated Messages in Python workers
> ------------------------------------
>
> Key: FLINK-22729
> URL: https://issues.apache.org/jira/browse/FLINK-22729
> Project: Flink
> Issue Type: Bug
> Components: Stateful Functions
> Affects Versions: statefun-2.2.2
> Environment: The Stateful Function version is 2.2.2, java8. The Java
> App as well as
> the external Python workers are deployed in the same kubernetes cluster.
> Reporter: Stephan Ewen
> Priority: Critical
> Fix For: statefun-3.1.0
>
>
> Recently we started seeing the following faulty behavior in the Flink
> Stateful Functions HTTP communication towards external Python workers.
> This is only occurring when the system is under heavy load.
> The Java Application will send HTTP Messages to an external Python
> Function but the external Function fails to parse the message with a
> "Truncated Message Error". Printouts show that the truncated message
> looks as follows:
> {code}
> <Start of Message>
> my.protobuf.MyClass: <Protobuf Content>
> my.protobuf.MyClass: <Protobuf Content>
> my.protobuf.MyClass: <Protobuf Content>
> my.protobuf.MyClass: <Protob
> {code}
> Which leads to the following Error in the Python worker:
> {code}
> Error Parsing Message: Truncated Message
> {code}
> Either the sender or the receiver (or something in between) seems to be
> truncacting some (not all) messages at some random point in the payload.
> The source code in both Flink SDKs looks to be correct. We temporarily
> solved this by setting the "maxNumBatchRequests" parameter in the
> external function definition really low. But this is not an ideal
> solution as we believe this adds considerable communication overhead
> between the Java and the Python Functions.
> The Stateful Function version is 2.2.2, java8. The Java App as well as
> the external Python workers are deployed in the same kubernetes cluster.
> ----
> This was reported on the Mailing List in
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Statefun-Truncated-Messages-in-Python-workers-td43831.html
--
This message was sent by Atlassian Jira
(v8.3.4#803005)