[ 
https://issues.apache.org/jira/browse/MESOS-4642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15157470#comment-15157470
 ] 

Chris Pennello commented on MESOS-4642:
---------------------------------------

bq. In my opinion, encoding everything as UTF-8 would be an incorrect approach. 
Doing this introduces ambiguity with regards to the original data.
Indeed, without an extra encoding scheme, this seems like an unavoidable 
consequence of using JSON.
bq. For us to actually output valid JSON, we need to encode the output as 
unicode.
I think it's a little worse than that; per the above, there simply exists data 
that is unrepresentable.

Clients still do have {{files/read}} to use if they want the raw data, right?

One idea is to add extra, Unicode-friendly encoding to {{files/read.json}} for 
raw data.  For example, it could be Base64-encoded and _then_ dumped to JSON.

Maybe as a more client-friendly idea, perhaps we could augment 
{{files/read.json}} such that sequences of bytes that can't be interpreted as 
UTF-8 encoded Unicode are replaced by a {{?}} character?  [(This is kind of 
akin to Python's {{unicode(..., 
errors='replace')}}.)|https://docs.python.org/2/howto/unicode.html#the-unicode-type]
  That way, we'd be able to get valid JSON out of {{files/read.json}} (a 
plus!), and have "reasonable" behavior for unrepresentable data.

As a wild idea, perhaps if we still wanted endpoints that could represent 
arbitrary, but _structured_ data, we might consider adding an additional 
serialization format, such as [MessagePack|http://msgpack.org/].

> Mesos Agent Json API can dump binary data from log files out as invalid JSON
> ----------------------------------------------------------------------------
>
>                 Key: MESOS-4642
>                 URL: https://issues.apache.org/jira/browse/MESOS-4642
>             Project: Mesos
>          Issue Type: Bug
>          Components: json api, slave
>    Affects Versions: 0.27.0
>            Reporter: Steven Schlansker
>            Priority: Critical
>
> One of our tasks accidentally started logging binary data to stderr.  This 
> was not intentional and generally should not happen -- however, it causes 
> severe problems with the Mesos Agent "files/read.json" API, since it gladly 
> dumps this binary data out as invalid JSON.
> {code}
> # hexdump -C /path/to/task/stderr | tail
> 0003d1f0  6f 6e 6e 65 63 74 69 6f  6e 0a 4e 45 54 3a 20 31  |onnection.NET: 1|
> 0003d200  20 6f 6e 72 65 61 64 20  45 4e 4f 45 4e 54 20 32  | onread ENOENT 2|
> 0003d210  39 35 34 35 36 20 32 35  31 20 32 39 35 37 30 37  |95456 251 295707|
> 0003d220  0a 01 00 00 00 00 00 00  ac 57 65 64 2c 20 31 30  |.........Wed, 10|
> 0003d230  20 55 6e 72 65 63 6f 67  6e 69 7a 65 64 20 69 6e  | Unrecognized in|
> 0003d240  70 75 74 20 68 65 61 64  65 72 0a                 |put header.|
> {code}
> {code}
> # curl 
> 'http://agent-host:5051/files/read.json?path=/path/to/task/stderr&offset=220443&length=90000&grep='
>  | hexdump -C
> 00007970  6e 65 63 74 69 6f 6e 5c  6e 4e 45 54 3a 20 31 20  |nection\nNET: 1 |
> 00007980  6f 6e 72 65 61 64 20 45  4e 4f 45 4e 54 20 32 39  |onread ENOENT 29|
> 00007990  35 34 35 36 20 32 35 31  20 32 39 35 37 30 37 5c  |5456 251 295707\|
> 000079a0  6e 5c 75 30 30 30 31 5c  75 30 30 30 30 5c 75 30  |n\u0001\u0000\u0|
> 000079b0  30 30 30 5c 75 30 30 30  30 5c 75 30 30 30 30 5c  |000\u0000\u0000\|
> 000079c0  75 30 30 30 30 5c 75 30  30 30 30 ac 57 65 64 2c  |u0000\u0000.Wed,|
> 000079d0  20 31 30 20 55 6e 72 65  63 6f 67 6e 69 7a 65 64  | 10 Unrecognized|
> 000079e0  20 69 6e 70 75 74 20 68  65 61 64 65 72 5c 6e 22  | input header\n"|
> 000079f0  2c 22 6f 66 66 73 65 74  22 3a 32 32 30 34 34 33  |,"offset":220443|
> 00007a00  7d                                                |}|
> {code}
> This causes downstream sadness:
> {code}
> ERROR [2016-02-10 18:55:12,303] 
> io.dropwizard.jersey.errors.LoggingExceptionMapper: Error handling a request: 
> 0ee749630f8b26f1
> ! com.fasterxml.jackson.core.JsonParseException: Invalid UTF-8 start byte 0xac
> !  at [Source: org.jboss.netty.buffer.ChannelBufferInputStream@6d69ee8; line: 
> 1, column: 31181]
> ! at 
> com.fasterxml.jackson.core.JsonParser._constructError(JsonParser.java:1487) 
> ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.base.ParserMinimalBase._reportError(ParserMinimalBase.java:518)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidInitial(UTF8StreamJsonParser.java:3339)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._reportInvalidChar(UTF8StreamJsonParser.java:3333)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString2(UTF8StreamJsonParser.java:2360)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser._finishString(UTF8StreamJsonParser.java:2287)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText(UTF8StreamJsonParser.java:286)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:29)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.std.StringDeserializer.deserialize(StringDeserializer.java:12)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.SettableBeanProperty.deserialize(SettableBeanProperty.java:523)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:381)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1073)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserializeFromObject(SuperSonicBeanDeserializer.java:196)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.deser.BeanDeserializer.deserialize(BeanDeserializer.java:142)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.module.afterburner.deser.SuperSonicBeanDeserializer.deserialize(SuperSonicBeanDeserializer.java:117)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3562)
>  ~[singularity-0.4.9.jar:0.4.9]
> ! at 
> com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2648) 
> ~[singularity-0.4.9.jar:0.4.9]
> ! at com.hubspot.singularity.data.SandboxManager.read(SandboxManager.java:97) 
> ~[singularity-0.4.9.jar:0.4.9]
> {code}
> This is extremely similar to https://issues.apache.org/jira/browse/MESOS-3771
> Since this is now the second major issue caused by this, is there any 
> possibility of using a JSON processing library that actually guarantees 
> spec-compliant output?  I know we can fix the point problem again here, but 
> it is frustrating that this keeps happening, and I'm sure it will happen 
> again in the future.
> Failing that, maybe we should audit all JSON objects produced to ensure they 
> cannot contain binary data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to