[
https://issues.apache.org/jira/browse/MESOS-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14967962#comment-14967962
]
Steven Schlansker commented on MESOS-3771:
------------------------------------------
Okay, I have distilled down the reproduction case.
Using the Python test-framework with the following diff applied:
{code}
diff --git a/src/examples/python/test_framework.py
b/src/examples/python/test_framework.py
index 6af6d22..95abb97 100755
--- a/src/examples/python/test_framework.py
+++ b/src/examples/python/test_framework.py
@@ -150,6 +150,7 @@ class TestScheduler(mesos.interface.Scheduler):
print "but received", self.messagesReceived
sys.exit(1)
print "All tasks done, and all messages received, exiting"
+ time.sleep(30)
driver.stop()
if __name__ == "__main__":
@@ -158,6 +159,7 @@ if __name__ == "__main__":
sys.exit(1)
executor = mesos_pb2.ExecutorInfo()
+ executor.data = b'\xAC\xED'
executor.executor_id.value = "default"
executor.command.value = os.path.abspath("./test-executor")
executor.name = "Test Executor (Python)"
{code}
if you run the test framework, and during the 30 second wait after it finishes,
try to grab the {{/master/state.json}} endpoint, you will get a response that
has invalid UTF8 in it:
{code}
Caused by: com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start
byte 0xac
at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@54c8158d; line: 1,
column: 6432]
{code}
I tested against both 0.24.1 and current master, both exhibit the bad behavior.
> Mesos JSON API creates invalid JSON due to lack of binary data / non-ASCII
> handling
> -----------------------------------------------------------------------------------
>
> Key: MESOS-3771
> URL: https://issues.apache.org/jira/browse/MESOS-3771
> Project: Mesos
> Issue Type: Bug
> Components: HTTP API
> Affects Versions: 0.24.1, 0.26.0
> Reporter: Steven Schlansker
> Priority: Critical
>
> Spark encodes some binary data into the ExecutorInfo.data field. This field
> is sent as a "bytes" Protobuf value, which can have arbitrary non-UTF8 data.
> If you have such a field, it seems that it is splatted out into JSON without
> any regards to proper character encoding:
> {code}
> 0006b0b0 2e 73 70 61 72 6b 2e 65 78 65 63 75 74 6f 72 2e |.spark.executor.|
> 0006b0c0 4d 65 73 6f 73 45 78 65 63 75 74 6f 72 42 61 63 |MesosExecutorBac|
> 0006b0d0 6b 65 6e 64 22 7d 2c 22 64 61 74 61 22 3a 22 ac |kend"},"data":".|
> 0006b0e0 ed 5c 75 30 30 30 30 5c 75 30 30 30 35 75 72 5c |.\u0000\u0005ur\|
> 0006b0f0 75 30 30 30 30 5c 75 30 30 30 66 5b 4c 73 63 61 |u0000\u000f[Lsca|
> 0006b100 6c 61 2e 54 75 70 6c 65 32 3b 2e cc 5c 75 30 30 |la.Tuple2;..\u00|
> {code}
> I suspect this is because the HTTP api emits the executorInfo.data directly:
> {code}
> JSON::Object model(const ExecutorInfo& executorInfo)
> {
> JSON::Object object;
> object.values["executor_id"] = executorInfo.executor_id().value();
> object.values["name"] = executorInfo.name();
> object.values["data"] = executorInfo.data();
> object.values["framework_id"] = executorInfo.framework_id().value();
> object.values["command"] = model(executorInfo.command());
> object.values["resources"] = model(executorInfo.resources());
> return object;
> }
> {code}
> I think this may be because the custom JSON processing library in stout seems
> to not have any idea of what a byte array is. I'm guessing that some
> implicit conversion makes it get written as a String instead, but:
> {code}
> inline std::ostream& operator<<(std::ostream& out, const String& string)
> {
> // TODO(benh): This escaping DOES NOT handle unicode, it encodes as ASCII.
> // See RFC4627 for the JSON string specificiation.
> return out << picojson::value(string.value).serialize();
> }
> {code}
> Thank you for any assistance here. Our cluster is currently entirely down --
> the frameworks cannot handle parsing the invalid JSON produced (it is not
> even valid utf-8)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)