Hello OpenWhisk developers,

When a blocking action is invoked, the controller waits for that action's
response from the invoker and also polls the artifact store for the same
response. Usually blocking invocation responses are obtained from the
invoker. However, there are instances when the invocation response is
retrieved from the artifact store instead. From observation, the most
likely scenario for a blocking activation to be retrieve from the artifact
store is when an action generates a response that exceeds the maximum
allowed Kafka message size for the "completed" topic. However, this
situation should not occur as large action responses are meant to be
truncated by the invoker to the allowed maximum Kafka message size for the
corresponding topic.

Currently artifact store polling for activation records is masking a bug
involving large action responses. While OpenWhisk provides a configuration
value, whisk.activation.payload.max, for what one would assume would allow
for adjustments to be made to the maximum activation record size, this
configuration value only adjusts the Kafka topic that is used to schedule
actions for invocation. Instead the Kafka topic used to communicate the
completion of an action always uses the default value for
KAFKA_MESSAGE_MAX_BYTES, which is ~1MB. Additionally, the invoker truncates
action responses to the whisk.activation.payload.max value even though
whisk.activation.payload.max is not being applied properly to the
"completed" Kafka topic. More over, this truncation does not account for
data added to the action response by the Kafka producer during
serialization, so an action response may fail to be sent to the "completed"
topic even if its actual action response size adheres to the topic's size
limitations. As a result, any action response plus the size of
serialization done by the Kafka producer that exceeds ~1MB will be
retrieved via artifact store polling.

Performance degradation appears to occur when an activation recorded is
retrieved via artifact store polling. Artifact store polling occurs every
15 seconds for a blocking invocation. Since the response of an action that
generates a payload greater than ~1MB can not be sent through the
"completed" Kafka topic, that action's activation record must be retrieved
via polling. Even though such an action may complete in milliseconds, the
end user will not get back the activation response for at least 15 seconds
due to the polling logic in the controller.

I have submitted a pull request to remove the polling mechanism and also
fix the large action response bug. The pull request can be found here:
https://github.com/apache/incubator-openwhisk/pull/4033.

Regards,
James Dubee

Reply via email to