Hi All,
I was trying to add this comment to JIRA issue
[http://issues.apache.org/jira/browse/XERCESC-681] but it informs me
that I do not have permission to comment on that item. Please accept my
comments here instead as I have just subscribed to the JIRA system.
This remark is in relation to Xerces-C++ 2.4.0 which I am currently
testing some code with. I won't have time to test this with later source
versions for quite a while.
Also, I did not find very much in the JIRA system or the mail archives
in relation to the writing of BOM, well I found some comment but nothing
definitive on whether or not it works with Xerces yet. This bug entry
also appears non-definitive (unverified). So basically this may well be
in the same state in later versions.
Regards
Dara
---BEGIN COMMENT -- JIRA issue =
http://issues.apache.org/jira/browse/XERCESC-681 ---
Hi all,
I wasn't sure of the status of this so I have just run the DOMPrint
example in xerces-c++ 2.4.0 (linux) and have successfully output both
the big endian and little endian Byte Order Marks for UTF-16 output.
I was looking for an option to write the BOM and didn't find anything
helpful in docs, mail lists, etc. until I came across this issue.
I am now using the code as per the DOMPrint example to set the option
on my writer when the BOM is required.
*** NOTE ***
One potential issue I noticed in my trials :
Generally, if the writer does note have an encoding set, but the DOM to
be written does (encodig or ActualEncoding is set), then the DOM
encoding value is used.
However, when we activate the writing of a BOM, then the writing will
fail with a segfault (for iconv at least) due to the following :
(i called writeNode() on the writer)
+In this snip, "fEncoding" is null. I assumed this was due to no writer
encoding being set, and my tests appear to substantiate this.
--- 8< ---
void DOMWriterImpl::processBOM()
{
// if the feature is not set, don't output bom
if (!getFeature(BYTE_ORDER_MARK_ID))
return;
if ((XMLString::compareIString(fEncoding,
XMLUni::fgUTF16LEncodingString) == 0) ||
(XMLString::compareIString(fEncoding,
XMLUni::fgUTF16LEncodingString2) == 0) )
--- >8 ---
+ thus when we get here, the first parm is 0x0 and the second is (XMLCh)
"UTF-16(LE)"
--- 8< ---
int XMLString::compareIString( const XMLCh* const str1
, const XMLCh* const str2)
{
// Refer this one to the transcoding service
return XMLPlatformUtils::fgTransService->compareIString(str1, str2);
}
--- >8 ---
+ thus we fail here while trying to de-reference cptr1 in the "while"
statement.
--- 8< ---
//
---------------------------------------------------------------------------
// IconvTransService: The virtual transcoding service API
//
---------------------------------------------------------------------------
int IconvTransService::compareIString( const XMLCh* const comp1
, const XMLCh* const comp2)
{
const XMLCh* cptr1 = comp1;
const XMLCh* cptr2 = comp2;
while ( (*cptr1 != 0) && (*cptr2 != 0) )
{
wint_t wch1 = towupper(*cptr1);
wint_t wch2 = towupper(*cptr2);
if (wch1 != wch2)
break;
cptr1++;
cptr2++;
}
return (int) ( towupper(*cptr1) - towupper(*cptr2) );
}
--- >8 ---
If I set a writer encoding, regardless of whether or not a DOM encoding
or actual encoding is set, then it looks like it's working fine.
I don't know if this is desired, but my assumptions would have led me to
believe that if the writer has no encoding set and generally takes then
it's ecoding from the item to be written, it should also do so when
comparing encodings for the purposes of writing a BOM.....?
Regards
Dara
---END COMMENT -- JIRA issue =
http://issues.apache.org/jira/browse/XERCESC-681 ---
--
Regards,
Dara Mulvihill,
Rísarís Ltd,
http://www.risaris.com
++353 404 64009
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]