[
http://issues.apache.org/jira/browse/XERCESC-681?page=comments#action_12362454
]
dara mulvihill commented on XERCESC-681:
----------------------------------------
Hi all,
I wasn't sure of the status of this so I have just run the DOMPrint example in
xerces-c++ 2.4.0 (linux) and have successfully output both the big endian and
little endian Byte Order Marks for UTF-16 output.
I was looking for an option to write the BOM and didn't find anything helpful
in docs, mail lists, etc. until I came across this issue.
I am now using the code as per the DOMPrint example to set the option on my
writer when the BOM is required.
*** NOTE ***
One potential issue I noticed in my trials :
Generally, if the writer does note have an encoding set, but the DOM to be
written does (encodig or ActualEncoding is set), then the DOM encoding value is
used.
However, when we activate the writing of a BOM, then the writing will fail with
a segfault (for iconv at least) due to the following :
(i called writeNode() on the writer)
+In this snip, "fEncoding" is null. I assumed this was due to no writer
encoding being set, and my tests appear to substantiate this.
--- 8< ---
void DOMWriterImpl::processBOM()
{
// if the feature is not set, don't output bom
if (!getFeature(BYTE_ORDER_MARK_ID))
return;
if ((XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString)
== 0) ||
(XMLString::compareIString(fEncoding, XMLUni::fgUTF16LEncodingString2)
== 0) )
--- >8 ---
+ thus when we get here, the first parm is 0x0 and the second is (XMLCh)
"UTF-16(LE)"
--- 8< ---
int XMLString::compareIString( const XMLCh* const str1
, const XMLCh* const str2)
{
// Refer this one to the transcoding service
return XMLPlatformUtils::fgTransService->compareIString(str1, str2);
}
--- >8 ---
+ thus we fail here while trying to de-reference cptr1 in the "while" statement.
--- 8< ---
// ---------------------------------------------------------------------------
// IconvTransService: The virtual transcoding service API
// ---------------------------------------------------------------------------
int IconvTransService::compareIString( const XMLCh* const comp1
, const XMLCh* const comp2)
{
const XMLCh* cptr1 = comp1;
const XMLCh* cptr2 = comp2;
while ( (*cptr1 != 0) && (*cptr2 != 0) )
{
wint_t wch1 = towupper(*cptr1);
wint_t wch2 = towupper(*cptr2);
if (wch1 != wch2)
break;
cptr1++;
cptr2++;
}
return (int) ( towupper(*cptr1) - towupper(*cptr2) );
}
--- >8 ---
If I set a writer encoding, regardless of whether or not a DOM encoding or
actual encoding is set, then it looks like it's working fine.
I don't know if this is desired, but my assumptions would have led me to
believe that if the writer has no encoding set and generally takes then it's
ecoding from the item to be written, it should also do so when comparing
encodings for the purposes of writing a BOM.....?
Regards
Dara
> Allow Xerces to write the BOM to XML files
> ------------------------------------------
>
> Key: XERCESC-681
> URL: http://issues.apache.org/jira/browse/XERCESC-681
> Project: Xerces-C++
> Type: Bug
> Components: DOM
> Versions: 2.1.0
> Environment: Operating System: Windows NT/2K
> Platform: PC
> Reporter: David Hoffer
> Assignee: Xerces-C Developers Mailing List
>
> I am current creating UTF16-LE files and these cannot be opened with IE
> because
> Xerces does not add the BOM to the begining of the file. It would be nice if
> Xercec would add the appropriate BOM for all encoding types.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]