You can use the sun.misc code, but it's not guaranteed to be present in all JVMs (don't know if it's in IBM ones, for instance). The base64 code's not too complex in any case, so I'll plug in my own versions for the next release.
And yes, base64 does add some size and overhead. The alternative of embedding binary is likely to cause you major problems sooner or later, though. I said how you can disable the checking for output, but input is not nearly as easy. You may well end up needing to go into the code for any parser that's going to read your data and modifying that so it doesn't error-out when it comes to invalid characters. I haven't checked the default xpp3 parser supplied with JiBX to see how well it checks input (all parsers are supposed to check the character codes, according to the XML recommendation, though many don't do this completely), but almost any parser will at least check for a 0 byte and die if they see one. I honestly don't think
There may be alternatives in the future. There's something called MTOM (http://www.w3.org/TR/2004/WD-soap12-mtom-20040209/) that's designed for SOAP, but if it really does get approved is probably going to have a pervasive effect on all XML processing. This basically allows for blocks of data to be logically embedded in the XML message, yet serialized separately so that they can be sent as pure binary. I think it's a truly ugly approach from many standpoints, but it would handle your type of situation. :-)
In the meantime, the overhead of base64 is pretty reasonable (it basically expands the data size by 33% for your binary components, and at least the JiBX implementation will have good performance).
- Dennis
Gareth wrote:
Interesting. You can get Base64 encoded/decoded from byte[]'s by using Suns classes (sun.misc.BASE64Decoder and sun.misc.BASE64Encoder). It would save some number of lines to have a Marshaller/Unmarshaller for that.
That being said, Base 64 bloats the size of the encoded data. For small data sets (e.g. keys & hashes) this isn't a problem. On large data sets it adds up quickly, for network transmission you are burning an important resource. I would like to be able to send raw binary in an element to save space. Obviously doing so would mean taking full responsibility for what parser attempts to decode the documents i.e. warn other potential users that these are not XML documents.
The encoding I came up with looks something like this:
<element>length:RAW_BYTES<element>
'length' should be integers only and could be decoded easily using Integer.parseInt(). A ':' denotes the end of the length field. This is how BitTorrent b-encoding works, except their length field is binary also. You would have an element that looks like this:
<element>40:This -> is a 40 character binary message</message>
So the question is, can I coerce JiBX into round tripping that document without modifications to the Parser and/or JiBX itself? I assume that an ICharacterEscaper and a custom Marshaller/Unmarshaller are involved but what other subtle traps are there?
This isn't something you want in the distribution but perhaps as a separate download with lots of warnings. (DANGER: Here be Dragons!)
--
Gareth Farrington
Dennis Sosnoski wrote:
And another:
The character restriction is actually part of the XML standard, which is why the code is checking this. I don't want people to accidentally produce documents that will be rejected by a parser.
If you want to override this for your documents and knowingly produce something that will be rejected by most parsers you can do so by creating your own ICharacterEscaper implementation and using the MarshallingContext.setOutput() call that takes an ICharacterEscaper. For this purpose you could copy the code in UTF8Escaper and just take out the checks for chr < 0x20 and not a tab, new line, or line feed. This is dangerous, though; even if the supplied xpp3 parser doesn't check this on input, most parsers will (including parsers such as StAX that may become the default for JiBX in the future).
If you know this data contains binary values, you really should use base64 encoding on it. I've been meaning to add a standard serializer/deserializer for base64 to the Utilities class and make this the default for byte[]. Would that suit your needs?
- Dennis
Stefano Fornari wrote:
Hi Dennis,
the 0x03 is probably the offending character...
By the way, IMHO the point is not what is formally in the standard, but what makes sense. Is there any advantage in throwing an exception if I try to write a binary code? If yes, it is ok. If it is just because 0x3 is not listed in the specs, I would have some things to say.
I really love jibx, but being too strictly rigoruse dimineshes its applications. Unfortunately, the external world is not so conformant and I think jibx should be flexible in how sever it has to be. It is the same problem with namespaces. There is no way to say: " listen, forget about namespaces, just parse the tags! ".
I know, I have no rights to pretend anything, take that feedbacks as user comments. If you will consider it for a fix, I would really appreciate it. If you won't fix it, I will do it just for my project.
Thanks for the good work,
Stefano
Dennis Sosnoski wrote:
What are the actual binary values of those characters in your string? The error message is saying that one of them is a 0x03, which is completely illegal in XML 1.0.
- Dennis
Stefano Fornari wrote:
Hi All,
We are using JiBX in a production system; unfortunately, we are facing some problems that I hope somebody can help with.
This is the latter:
org.jibx.runtime.JiBXException: Error writing marshalled document
Root cause: java.io.IOException: Illegal character code 0x3 in content text
at org.jibx.runtime.impl.MarshallingContext.element(MarshallingContext.java:642)
Caused by:
java.io.IOException: Illegal character code 0x3 in content text
at org.jibx.runtime.impl.UTF8StreamWriter.writeTextContent(UTF8StreamWriter.java:254)
at org.jibx.runtime.impl.MarshallingContext.element(MarshallingContext.java:638)
This happens marshalling the following XML snippet:
<Cred> <Meta><Type>syncml:auth-md5</Type></Meta> <Data>N?ýU#>G?a¦???ª??</Data> </Cred>
With the following mapping:
<mapping name="Cred" class="sync4j.framework.core.Cred">
<structure field="authentication" class="sync4j.framework.core.Authentication">
<structure name="Meta" class="sync4j.framework.core.Meta">
<value name="Format" field="format" usage="optional"/>
<value name="Type" field="type"/>
</structure>
<structure name="Data" usage="optional">
<value field="data" style="text" class="sync4j.framework.core.Data" usage="optional"/>
</structure>
</structure>
</mapping>
It looks like a bug in the writer.
Regards, Stefano
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
jibx-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jibx-users
-- Dennis M. Sosnoski Enterprise Java, XML, and Web Services Training and Consulting http://www.sosnoski.com Redmond, WA 425.885.7197
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
jibx-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/jibx-users