Sterling pound sign encoding sith XML string
--------------------------------------------

                 Key: XERCESC-1811
                 URL: https://issues.apache.org/jira/browse/XERCESC-1811
             Project: Xerces-C++
          Issue Type: Bug
         Environment: Solaris s86, xerces 2-7-0
            Reporter: Jean-Baptiste Wons


Hello.

I am not sure if this is a bug in xerces or me not using xerces well.

This is my code:

<code>

#include <string>
#include <iostream>

#include <xercesc/dom/DOM.hpp>
#include <xercesc/dom/DOMException.hpp>
#include <xercesc/dom/DOMImplementationRegistry.hpp>
#include <xercesc/framework/MemBufInputSource.hpp>
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/util/PlatformUtils.hpp>
#include <xercesc/util/XMLString.hpp>


using namespace std;
using namespace XERCES_CPP_NAMESPACE;

void replaceSpecialCharactersXML(std::string &s)
{
    string cp;
    unsigned int i;
    cp.reserve(s.size()*2);
    for (i = 0; i < s.size(); i++)
    {
        const unsigned char c = s[i];

        if ((c < 32 && c != '\012' && c != '\015') || c > 127)
        {
            char buffer[10000];
            sprintf(buffer, "&#x%02x;", c);
            cp += buffer;
        }
        else
        {
            cp += c;
        }
    }
    s = cp;
}


int main()
{
    XMLPlatformUtils::Initialize();
    string aString0 ("This will crash ££££ ...");
    XMLCh*   fUnicodeForm =  XMLString::transcode(aString0.c_str());
    char *pMsg = XMLString::transcode(fUnicodeForm);
    string res(pMsg);
    replaceSpecialCharactersXML(res);

    cout << aString0 << " -> " << pMsg << " -> " << res << endl;

    return 0;
}

</code>

When I compile and run, I have that output:

<output>
sh$ ./testxerces                                                                
                                                                        
This will crash ££££ ... -> This will crash  ... -> This will crash 
&#x1a;&#x1a;&#x1a;&#x1a; ...
</output>

When I transcode the £ sign to XMLCh, then transcode it back to a char*, it is 
transformed to 0x1a.

Is it a real bug, or is it just me missing something ?

Regards,
Jean-Baptiste



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to