Jim Rhyness created TOREE-391:
---------------------------------
Summary: Messages to Jupyter kernel gateway are dropped in jeromq
Key: TOREE-391
URL: https://issues.apache.org/jira/browse/TOREE-391
Project: TOREE
Issue Type: Bug
Affects Versions: 0.1.0
Environment: Linux ( RHEL 7.3 )
Reporter: Jim Rhyness
Kernel restart from Jupyter kernel gateway is failing with a timeout. The
kernel is restarted, but kernel gateway times out waiting for a
kernel_info_reply message that it is
expecting in response to kernel_info_request that it sends after initiating the
restart.
The problem is reproducible most of the time with something like this:
curl -v -X POST --data '{ "name":"apache_toree_scala" }'
http://127.0.0.1:8888/api/kernels
curl -v -X POST --data '{}'
http://127.0.0.1:8888/api/kernels/<kernelid-from-above>/restart
>From the IPython message protocol doc, this is the message format:
[
b'u-u-i-d', # zmq identity(ies)
b'<IDS|MSG>', # delimiter
b'baddad42', # HMAC signature
b'{header}', # serialized header dict
b'{parent_header}', # serialized parent header dict
b'{metadata}', # serialized metadata dict
b'{content}, # serialized content dict
b'blob', # extra raw data buffer(s)
...
]
The first frame of the message contains zmq identities which, in some cases in
a Router-type socket, are generated by jeromq and then consist of five bytes -
0 followed by a random int.
In Toree, all frames are treated as Strings. Conversion to UTF-8 corrupts the
zmq id, replacing non-UTF-8 characters by the replacement character 0xEFBFBD.
When the corrupted id is used in a message sent to the Router socket, the peer
to send the message to is not found and the message is dropped.
This affects other messages as well, not just kernel_info_reply.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)