The Problem:

I've been looking at the encoding exception which is being thrown when you click on the "Services" menu item in our current implementation. By default we seem to be using JSON as our RPC mechanism. The exception is being thrown when the JSON encoder hits a certificate. Recall that we store certificates in LDAP as binary data and in our implementation we distinguish binary data from text by Python object type, text is *always* a unicode object and binary data is *always* a str object. However in Python 2.x str objects are believed to be text and are subject to encoding/decoding in many parts of the Python world.

Unlike XML-RPC JSON does *not* have a binary type. In JSON there are *only* unicode strings. So what is happening is that that when the JSON encoder sees our certificate data in a str object it says "str objects are text and we have to produce a UTF-8 unicode encoding from that str object". There's the problem! It's completely nonsensical to try and encode binary to to UTF-8.

The right way to handle this is to encode the binary data to base64 ASCII text and then hand it to JSON. FWIW our XML-RPC handler does this already because XML-RPC knows about binary data and elects to encode/decode it to base64 as it's marshaled and unmarshaled. But JSON can't do this during marhasling and unmarshaling because the JSON protocol has no concept of binary data.

The python JSON encoder class does give us the option to hook into the encoder and check if the object is a str object and then base64 encode. But that doesn't help us at the opposite end. How would we know when unmarshaling that a given string is supposed to be base64 decoded back into binary data? We could prepend a special string and hope that string never gets used by normal text (yuck). Keeping a list of what needs base64 decoding is not an option within JSON because at the time of decoding we have no information available about the context of the JSON objects.

That means if we want to use JSON we really should push the base64 encode/decode to the parts of the code which have a priori knowledge about the objects they're pushing through the command interface. This would mean any command which passes a certificate should base64 encode it prior to sending it and base64 decode after it come back from a command result. Actually it would be preferable to use PEM encoding, and by the way, the whole reason why PEM encodings for certificates was developed was exactly for this scenario: transporting a certificate through a text based interchange mechanism!

Possible Solutions:

As I see it we have these options in front of us for how to deal with this problem:

* Drop support for JSON, only use XML-RPC

* Once we read a certificate from LDAP immediately convert it to PEM format. Adopt the convention that anytime we exchange certificates it will be in PEM format. Only convert from PEM format when the target demands binary (e.g. storing it in LDAP, passing it to a library expecting DER encoded data, etc.).

* Come up with some hacky protocol on top of JSON which signals "this string is really binary" and check for it on every JSON encode/decode and cross our fingers no one tries to send a legitimate string which would trigger the encode/decode.

Question: Are certificates the one and only example of binary data we exchange?


My personal recommendation is we adopt the convention that certificates are always PEM encoded. We've already run into many problems trying to deduce what format a certificate is (e.g. binary, base64, PEM) I think it would be good if we just put a stake in the ground and said "certificates are always PEM encoded" and be done with all these problems we keep having with the data type of certificates.

As an aside I'm also skeptical of the robustness of allowing binary data at all in our implementation. Trying to support binary data has been nothing but a headache and a source of many many bugs. Do we really need it?

John Dennis <>

Looking to carve out IT costs?

Freeipa-devel mailing list

Reply via email to