[jira] Commented: (XERCESC-681) Allow Xerces to write the BOM to XML files

dara mulvihill (JIRA) Wed, 11 Jan 2006 07:16:49 -0800

    [ 
http://issues.apache.org/jira/browse/XERCESC-681?page=comments#action_12362454 
]


dara mulvihill commented on XERCESC-681:
----------------------------------------

Hi all,

I wasn't sure of the status of this so I have just run the DOMPrint example in 
xerces-c++ 2.4.0 (linux) and have successfully output both the big endian and 
little endian Byte Order Marks for UTF-16 output.

I was looking for an option to write the BOM and didn't find anything helpful 
in docs, mail lists, etc. until I came across this issue.

I am now using the code as per  the DOMPrint example to set the option on my 
writer when the BOM is required.

*** NOTE ***

One potential issue I noticed in my trials :

Generally, if the writer does note have an encoding set, but the DOM to be 
written does (encodig or ActualEncoding is set), then the DOM encoding value is 
used.

However, when we activate the writing of a BOM, then the writing will fail with 
a segfault (for iconv at least) due to the following :


(i called writeNode() on the writer)

+In this snip, "fEncoding" is null. I assumed this was due to no writer 
encoding being set, and my tests appear to substantiate this.

--- 8< ---

void DOMWriterImpl::processBOM()
{
   // if the feature is not set, don't output bom
   if (!getFeature(BYTE_ORDER_MARK_ID))
       return;
     if ((XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString)  
== 0) ||
       (XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString2) 
== 0)  )

--- >8 ---


+ thus when we get here, the first parm is 0x0 and the second is (XMLCh) 
"UTF-16(LE)"

--- 8< ---

int XMLString::compareIString(  const   XMLCh* const    str1
                               , const XMLCh* const    str2)
{
   // Refer this one to the transcoding service
   return XMLPlatformUtils::fgTransService->compareIString(str1, str2);
}

--- >8 ---



+ thus we fail here while trying to de-reference cptr1 in the "while" statement.

--- 8< ---

// ---------------------------------------------------------------------------
//  IconvTransService: The virtual transcoding service API
// ---------------------------------------------------------------------------
int IconvTransService::compareIString(  const   XMLCh* const    comp1
                                       , const XMLCh* const    comp2)
{
   const XMLCh* cptr1 = comp1;
   const XMLCh* cptr2 = comp2;
     while ( (*cptr1 != 0) && (*cptr2 != 0) )
   {
       wint_t wch1 = towupper(*cptr1);
       wint_t wch2 = towupper(*cptr2);
       if (wch1 != wch2)
           break;
         cptr1++;
       cptr2++;
   }
   return (int) ( towupper(*cptr1) - towupper(*cptr2) );
}

--- >8 ---

If I set a writer encoding, regardless of whether or not a DOM encoding or 
actual encoding is set, then it looks like it's working fine.


I don't know if this is desired, but my assumptions would have led me to 
believe that if the writer has no encoding set and generally takes then it's 
ecoding from the item to be written, it should also do so when comparing 
encodings for the purposes of writing a BOM.....?

Regards

Dara


> Allow Xerces to write the BOM to XML files
> ------------------------------------------
>
>          Key: XERCESC-681
>          URL: http://issues.apache.org/jira/browse/XERCESC-681
>      Project: Xerces-C++
>         Type: Bug
>   Components: DOM
>     Versions: 2.1.0
>  Environment: Operating System: Windows NT/2K
> Platform: PC
>     Reporter: David Hoffer
>     Assignee: Xerces-C Developers Mailing List

>
> I am current creating UTF16-LE files and these cannot be opened with IE 
> because 
> Xerces does not add the BOM to the begining of the file.  It would be nice if 
> Xercec would add the appropriate BOM for all encoding types.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Commented: (XERCESC-681) Allow Xerces to write the BOM to XML files

Reply via email to