Hi All,

I was trying to add this comment to JIRA issue [http://issues.apache.org/jira/browse/XERCESC-681] but it informs me that I do not have permission to comment on that item. Please accept my comments here instead as I have just subscribed to the JIRA system.

This remark is in relation to Xerces-C++ 2.4.0 which I am currently testing some code with. I won't have time to test this with later source versions for quite a while.

Also, I did not find very much in the JIRA system or the mail archives in relation to the writing of BOM, well I found some comment but nothing definitive on whether or not it works with Xerces yet. This bug entry also appears non-definitive (unverified). So basically this may well be in the same state in later versions.

Regards

Dara




---BEGIN COMMENT -- JIRA issue = http://issues.apache.org/jira/browse/XERCESC-681 ---

Hi all,

I wasn't sure of the status of this so I have just run the DOMPrint example in xerces-c++ 2.4.0 (linux) and have successfully output both the big endian and little endian Byte Order Marks for UTF-16 output.

I was looking for an option to write the BOM and didn't find anything helpful in docs, mail lists, etc. until I came across this issue.

I am now using the code as per the DOMPrint example to set the option on my writer when the BOM is required.

*** NOTE ***

One potential issue I noticed in my trials :

Generally, if the writer does note have an encoding set, but the DOM to be written does (encodig or ActualEncoding is set), then the DOM encoding value is used.

However, when we activate the writing of a BOM, then the writing will fail with a segfault (for iconv at least) due to the following :


(i called writeNode() on the writer)

+In this snip, "fEncoding" is null. I assumed this was due to no writer encoding being set, and my tests appear to substantiate this.

--- 8< ---

void DOMWriterImpl::processBOM()
{
   // if the feature is not set, don't output bom
   if (!getFeature(BYTE_ORDER_MARK_ID))
       return;
if ((XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString) == 0) || (XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString2) == 0) )

--- >8 ---


+ thus when we get here, the first parm is 0x0 and the second is (XMLCh) "UTF-16(LE)"

--- 8< ---

int XMLString::compareIString(  const   XMLCh* const    str1
                               , const XMLCh* const    str2)
{
   // Refer this one to the transcoding service
   return XMLPlatformUtils::fgTransService->compareIString(str1, str2);
}

--- >8 ---



+ thus we fail here while trying to de-reference cptr1 in the "while" statement.

--- 8< ---

// ---------------------------------------------------------------------------
//  IconvTransService: The virtual transcoding service API
// ---------------------------------------------------------------------------
int IconvTransService::compareIString(  const   XMLCh* const    comp1
                                       , const XMLCh* const    comp2)
{
   const XMLCh* cptr1 = comp1;
   const XMLCh* cptr2 = comp2;
while ( (*cptr1 != 0) && (*cptr2 != 0) )
   {
       wint_t wch1 = towupper(*cptr1);
       wint_t wch2 = towupper(*cptr2);
       if (wch1 != wch2)
           break;
cptr1++;
       cptr2++;
   }
   return (int) ( towupper(*cptr1) - towupper(*cptr2) );
}

--- >8 ---

If I set a writer encoding, regardless of whether or not a DOM encoding or actual encoding is set, then it looks like it's working fine.


I don't know if this is desired, but my assumptions would have led me to believe that if the writer has no encoding set and generally takes then it's ecoding from the item to be written, it should also do so when comparing encodings for the purposes of writing a BOM.....?

Regards

Dara


---END COMMENT -- JIRA issue = http://issues.apache.org/jira/browse/XERCESC-681 ---

--
Regards,

Dara Mulvihill,

Rísarís Ltd,

http://www.risaris.com

++353 404 64009





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to