[
https://issues.apache.org/jira/browse/XERCESC-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Scott Cantor resolved XERCESC-2054.
-----------------------------------
Resolution: Duplicate
This was already reported, and there are some additional comments over in the
original bug, so closing this.
> Grammar serialization not portable (integer size / alignment issue)
> -------------------------------------------------------------------
>
> Key: XERCESC-2054
> URL: https://issues.apache.org/jira/browse/XERCESC-2054
> Project: Xerces-C++
> Issue Type: Bug
> Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.1.3,
> 3.1.4
> Environment: Linux CentOS-7 (64bit), Windows 7 (64bit)
> Reporter: Oliver Moeller
>
> Apologies if this is a known issue, but I have not found it by conventional
> means (i.e., google an searching through the bug data base here).
> I found that the serialisation/deserialisation (here: of grammars) is not as
> portable as it (IMHO) should be.
> The problem happens in XSerializeEngine::readString() when
> the length of the string is taken from the associated BinInputStream as
> "unsigned long":
> /***
> * Check if any data written
> ***/
> unsigned long tmp;
> *this>>tmp;
> On a Windows7 x64, MSVS2012, this will take 4 byte off the head of the stream,
> but on a CentOS 7 x64 (g++ 4.8.3), this will take 8 byte.
> As a consequence, a BinInputStream carefully encoded on Windows (e.g. putting
> it into a char array with
> examples/cxx/tree/embedded/grammar-input-stream.cxx
> which is a common xsd example)
> will fail when "reading" it on the Linux box, because everything from the
> first
> string on is garbage.
> Moreover, this will (probably) give no meaningful error message, just a
> "XSerialisationException" thrown, cause at some point it will (probably)
> misinterpret wchar data as length information and try to read the next string
> that is millions of bytes long (according to the misunderstood
> BinInputStream).
> The BinInputStream will then run out of bytes.
> A similar issue is present concerning the *alignment* of the data according
> to data type that happens for all >> operations: this is (necessarily) very
> platform dependent.
> It would be a big improvement, if xerces would encode the (de)serialization
> in a platform/compiler independent manner. The purpose after all *IS* to be
> portable, right?
> E.g., the serialisation engine could always use integers of known byte width
> (e.g.: #include <inttypes.h> -> use uint32_t) instead of "unsigned long".
> ALso, the alignment issue should be addressed; it is hard to predict
> what restrictions apply for the used compiler (or even processor) here, some
> are not capable to read an integer from a memory address that is not 4-byte
> aligned.
> E.g., the data could be copied (to a properly aligned item initialized by 0s)
> before doing the cast to an integer type.
> In any case, it should always be platform-independent how many bytes are next
> to be read from the BinaryInputStream.
> (Of course, the write operations have to follow the same business logic.)
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]