[dom4j-dev] [ dom4j-Support Requests-1080334 ] XMLWriter Entity Replacement Problem

SourceForge.net Tue, 07 Dec 2004 08:40:02 -0800

Support Requests item #1080334, was opened at 2004-12-06 18:02
Message generated for change (Comment added) made by ben_cramer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=216035&aid=1080334&group_id=16035


Category: None
Group: None
>Status: Closed
Priority: 5
Submitted By: Ben_iPath (ben_cramer)
Assigned to: Nobody/Anonymous (nobody)
Summary: XMLWriter Entity Replacement Problem

Initial Comment:
We are processing a large number of SGML documents
containing SGML character entities used in scientific
writing such as the degree symbol &#176; and the mu
symbol &#956;. We are stripping the SGML entities and
replacing them with the appropriate ASCII codes prior
to writing out the objects with the XMLWriter (dom4j 1.4).

However, what we have found in the output are ?'s
wherever our SGML entities were replaced with the ASCII
code.

I have set the XMLWriter.setMaximumAllowedCharacter
value to -1 and it still produces the same result.

We have replacement values that approach 10K, such as
the diamonds symbol &#9830;

What can we do to have the parser ignore these entity
references so that the characters will be left in the
XML output?

Thanks in advance,

Ben Cramer
iPath Solutions

----------------------------------------------------------------------

>Comment By: Ben_iPath (ben_cramer)
Date: 2004-12-07 10:39

Message:
Logged In: YES 
user_id=1173159

added the following (-Dfile.encoding=utf8) to my batch file:

java -Dfile.encoding=utf8 -classpath ....

This fixed the problem. It was the Solaris OS that was
replacing the  encoding and not the DOM4J application.

I have closed the ticket.

Ben Cramer
iPath Solutions

----------------------------------------------------------------------

Comment By: Ben_iPath (ben_cramer)
Date: 2004-12-07 09:54

Message:
Logged In: YES 
user_id=1173159

Example:

final SAXReader reader = new SAXReader();
        try {
            final Document template = reader.read(file);
            BufferedReader in = new BufferedReader(new
FileReader(fSGML));
            String input;
            StringBuffer cleanString = new StringBuffer();
            while ((input = in.readLine()) != null) {
               cleanString.append(input);
            }
            String clString =
CleanAmpCharacter(cleanString.toString());
            in.close();
            Document cleanDoc =
DocumentHelper.parseText(clString);
// retrieve data in the nodes of cleanDoc and add to final
XML doc .......

            // Write the file out
            final XMLWriter writer = new XMLWriter(new
FileWriter( sFilePath ));
            writer.setMaximumAllowedCharacter(-1);
            writer.setResolveEntityRefs(false);
            writer.write( docXML );
            writer.close();
            logger.info(sFilePath + " created for import");

            //Import the new file
            new ImportXMLDocument(sFilePath);
} catch ....{
        }

.... code for the clean-up

private String CleanAmpCharacter(String sAmpChar) {

// simple regex replacement of code
// iterates through a file of codes and replacement values 
//Ex. 
sAmpChar = sAmpChar.replaceAll("&deg;", "&#176;");

}

It may actually be the DocumentHandler that is replacing the
entity values with ?.

Either way, I need help to clean these more effectively or
find a better solution.

Thanks,

Ben Cramer
iPath Solutions

----------------------------------------------------------------------

Comment By: Nobody/Anonymous (nobody)
Date: 2004-12-07 05:07

Message:
Logged In: NO 

Could you provide some example code that illustrates 
your problem?

Maarten

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=216035&aid=1080334&group_id=16035


-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. 
http://productguide.itmanagersjournal.com/
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

[dom4j-dev] [ dom4j-Support Requests-1080334 ] XMLWriter Entity Replacement Problem

Reply via email to