[jose] Initial results of Binary encoding in JSON-B and JSON-C

Phillip Hallam-Baker Tue, 31 Mar 2015 08:22:34 -0700

Yesterday I added implementations of JWS, JSON-B and JSON-C to my
existing JSON encoding suite (PROTOGEN).


For the sake of fair comparison, I have not attempted any further
optimization or made any changes to the encoding described here.

http://tools.ietf.org/html/draft-hallambaker-jsonbcd-02

I could easily shave a few more bytes off the total with additional
techniques which I considered but rejected as not being worth the
space/complexity tradeoff.


Implementation took approximately 2 hours for the encoding scheme and
4 hours for JWS. Much of the latter being spent writing test code to
make sure that the test vectors in draft-41 work (they do).

While implementing additional encodings is additional work, the binary
encodings are actually much easier to implement than the text. There
is no need to perform Base64 encoding. Estimating space requirements
is a lot easier, etc. If I was implementing one encoder for a
constrained device I would much prefer to do implement JSON-C than
JSON.


On the decoder side, JSON-B is a strict super-set of JSON which means
that a decoder must support both encodings unless a 'binary only'
subset is defined. But this does mean that a JSON-C decoder can decode
JSON-B or traditional JSON, one decoder fits all.


For a test case, I used:

http://tools.ietf.org/html/draft-ietf-jose-json-web-signature-41

Encoding the HMAC signature example results in the following encoding sizes:

In JSON:   244 bytes
In JSON-B: 165 bytes
In JSON-C: 129 bytes

In each case the Payload data is the string given in the example, even
though this is JSON, this is left as is to provide a fair basis for
comparison.

This slightly overstates the advantage of JSON-B as my JSON encoding
has indentation.

The main saving comes from avoiding the need to BASE64 armor the
binary blobs. Looking at the internals:

In JSON:   244 bytes
    Protected 35 / Payload 70 / Signature 32
In JSON-B: 165 bytes
    Protected 24 / Payload 70 / Signature 32
In JSON-C: 129 bytes
    Protected 13 / Payload 70 / Signature 32

Since the payload and signature are the same for each example, 102
bytes of the message are irreducible. The move from text to binary
blobs saves 79 bytes, 30 of which are coming from eliminating Base64.

Tag and string compression saves another 34 bytes, but only in the
case where we can pre-exchange the tag dictionary. JSON-C also
supports an on-the fly string compression technique but that only
provides savings for longer messages with large areas of repeated
texts.


Conclusion:

1) We do not need a new working group to specify a binary encoding of JOSE.

The fact that CBOR requires hand tweaking to apply it to a JSON data
structure is the reason that I and others objected to the approach
from the start.

Note that I wrote JSON-BCD in response to the statement made by the
CBOR cabal that they were a private group that was not required to be
open, consider alternative approaches or respect IETF consensus. In
particular the statement was made repeatedly that 'CBOR is not
intended to be a binary encoding of JSON'.


2) A binary encoding of JSON should not require additional IETF time,
effort or review.

The implementation of JSON-B is entirely mechanical and required no
additional input whatsoever.

The only additional input required for use of JSON-C is the
compilation of a tag dictionary. The one I used has 88 defined code
points which I compiled by looking through the IANA considerations
section of the draft. This could easily be produced automatically
through use of a tool.

Since JSON-B uses byte aligned tags and there is only a need for 88 of
them, the choice of tag values has absolutely no impact on the
compression efficiency.


3) A binary encoding should not require ongoing maintenance.

What worries me most about the CBOR fiasco is that we risk a MIB type
situation in which every new IETF JSON protocol requires a parallel
'CBOR' encoding and this becomes an ongoing maintenance requirement.

JSON-B is designed as a strict superset of JSON so that upwards
compatibility is guaranteed. This allows use of a new version of the
specification or support a privately defined tag that is not in the
dictionary without waiting for a new dictionary to be issued or a new
'binary' version of the specification to be defined.


4) The IETF needs a binary encoding of JSON that encodes precisely the
JSON data model with (almost) nothing added or taken away.

A Binary encoding of JSON does need to add a binary type which is an
extension of the JSON model. A case could also be made for a DateTime
intrinsic type which would be rendered in RFC3339 format strings in
JSON but I have resisted this so far.

One of the main reasons for rejecting many of the existing Binary JSON
formats is that the designers have found the temptation to add code
points for their favorite random data types irresistible.


5) While it is possible to improve JSON-B compression efficiency, the
savings are unlikely to be very interesting.

The use of the JWS example is instructive because the only way to
improve significantly on JSON-C would be to compress the payload.

Out of the 129 bytes used in the JSON-C version, 104 are data elements
and 25 are framing for two nested structures with a total of six
structure element. That is an average overhead of 4 bytes per element
including the tag and length data.

_______________________________________________
jose mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/jose

[jose] Initial results of Binary encoding in JSON-B and JSON-C

Reply via email to