[
https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157470#comment-15157470
]
Chris Pennello commented on MESOS-4642:
---------------------------------------
bq. In my opinion, encoding everything as UTF-8 would be an incorrect approach.
Doing this introduces ambiguity with regards to the original data.
Indeed, without an extra encoding scheme, this seems like an unavoidable
consequence of using JSON.
bq. For us to actually output valid JSON, we need to encode the output as
unicode.
I think it's a little worse than that; per the above, there simply exists data
that is unrepresentable.
Clients still do have {{files/read}} to use if they want the raw data, right?
One idea is to add extra, Unicode-friendly encoding to {{files/read.json}} for
raw data. For example, it could be Base64-encoded and _then_ dumped to JSON.
Maybe as a more client-friendly idea, perhaps we could augment
{{files/read.json}} such that sequences of bytes that can't be interpreted as
UTF-8 encoded Unicode are replaced by a {{?}} character? [(This is kind of
akin to Python's {{unicode(...,
errors='replace')}}.)|https://docs.python.org/2/howto/unicode.html#the-unicode-type]
That way, we'd be able to get valid JSON out of {{files/read.json}} (a
plus!), and have "reasonable" behavior for unrepresentable data.
As a wild idea, perhaps if we still wanted endpoints that could represent
arbitrary, but _structured_ data, we might consider adding an additional
serialization format, such as [MessagePack|http://msgpack.org/].
> Mesos Agent Json API can dump binary data from log files out as invalid JSON
> ----------------------------------------------------------------------------
>
> Key: MESOS-4642
> URL: https://issues.apache.org/jira/browse/MESOS-4642
> Project: Mesos
> Issue Type: Bug
> Components: json api, slave
> Affects Versions: 0.27.0
> Reporter: Steven Schlansker
> Priority: Critical
>
> One of our tasks accidentally started logging binary data to stderr. This
> was not intentional and generally should not happen -- however, it causes
> severe problems with the Mesos Agent "files/read.json" API, since it gladly
> dumps this binary data out as invalid JSON.
> {code}
> # hexdump -C /path/to/task/stderr | tail
> 0003d1f0 6f 6e 6e 65 63 74 69 6f 6e 0a 4e 45 54 3a 20 31 |onnection.NET: 1|
> 0003d200 20 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 | onread ENOENT 2|
> 0003d210 39 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 |95456 251 295707|
> 0003d220 0a 01 00 00 00 00 00 00 ac 57 65 64 2c 20 31 30 |.........Wed, 10|
> 0003d230 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 20 69 6e | Unrecognized in|
> 0003d240 70 75 74 20 68 65 61 64 65 72 0a |put header.|
> {code}
> {code}
> # curl
> 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr&offset=220443&length=90000&grep='
> | hexdump -C
> 00007970 6e 65 63 74 69 6f 6e 5c 6e 4e 45 54 3a 20 31 20 |nection\nNET: 1 |
> 00007980 6f 6e 72 65 61 64 20 45 4e 4f 45 4e 54 20 32 39 |onread ENOENT 29|
> 00007990 35 34 35 36 20 32 35 31 20 32 39 35 37 30 37 5c |5456 251 295707\|
> 000079a0 6e 5c 75 30 30 30 31 5c 75 30 30 30 30 5c 75 30 |n\u0001\u0000\u0|
> 000079b0 30 30 30 5c 75 30 30 30 30 5c 75 30 30 30 30 5c |000\u0000\u0000\|
> 000079c0 75 30 30 30 30 5c 75 30 30 30 30 ac 57 65 64 2c |u0000\u0000.Wed,|
> 000079d0 20 31 30 20 55 6e 72 65 63 6f 67 6e 69 7a 65 64 | 10 Unrecognized|
> 000079e0 20 69 6e 70 75 74 20 68 65 61 64 65 72 5c 6e 22 | input header\n"|
> 000079f0 2c 22 6f 66 66 73 65 74 22 3a 32 32 30 34 34 33 |,"offset":220443|
> 00007a00 7d |}|
> {code}
> This causes downstream sadness:
> {code}
> ERROR [2016-02-10 18:55:12,303]
> io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request:
> 0ee749630f8b26f1
> ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac
> ! at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line:
> 1, column: 31181]
> ! at
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:3333)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:117)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3562)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2648)
> ~[singularity-0.4.9.jar:0.4.9]
> ! at com.hubspot.singularity.data.SandboxManager.read(SandboxManager.java:97)
> ~[singularity-0.4.9.jar:0.4.9]
> {code}
> This is extremely similar to https://issues.apache.org/jira/browse/MESOS-3771
> Since this is now the second major issue caused by this, is there any
> possibility of using a JSON processing library that actually guarantees
> spec-compliant output? I know we can fix the point problem again here, but
> it is frustrating that this keeps happening, and I'm sure it will happen
> again in the future.
> Failing that, maybe we should audit all JSON objects produced to ensure they
> cannot contain binary data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)