Support Requests item #1080334, was opened at 2004-12-06 18:02 Message generated for change (Comment added) made by ben_cramer You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=216035&aid=1080334&group_id=16035
Category: None Group: None >Status: Closed Priority: 5 Submitted By: Ben_iPath (ben_cramer) Assigned to: Nobody/Anonymous (nobody) Summary: XMLWriter Entity Replacement Problem Initial Comment: We are processing a large number of SGML documents containing SGML character entities used in scientific writing such as the degree symbol ° and the mu symbol μ. We are stripping the SGML entities and replacing them with the appropriate ASCII codes prior to writing out the objects with the XMLWriter (dom4j 1.4). However, what we have found in the output are ?'s wherever our SGML entities were replaced with the ASCII code. I have set the XMLWriter.setMaximumAllowedCharacter value to -1 and it still produces the same result. We have replacement values that approach 10K, such as the diamonds symbol ♦ What can we do to have the parser ignore these entity references so that the characters will be left in the XML output? Thanks in advance, Ben Cramer iPath Solutions ---------------------------------------------------------------------- >Comment By: Ben_iPath (ben_cramer) Date: 2004-12-07 10:39 Message: Logged In: YES user_id=1173159 added the following (-Dfile.encoding=utf8) to my batch file: java -Dfile.encoding=utf8 -classpath .... This fixed the problem. It was the Solaris OS that was replacing the encoding and not the DOM4J application. I have closed the ticket. Ben Cramer iPath Solutions ---------------------------------------------------------------------- Comment By: Ben_iPath (ben_cramer) Date: 2004-12-07 09:54 Message: Logged In: YES user_id=1173159 Example: final SAXReader reader = new SAXReader(); try { final Document template = reader.read(file); BufferedReader in = new BufferedReader(new FileReader(fSGML)); String input; StringBuffer cleanString = new StringBuffer(); while ((input = in.readLine()) != null) { cleanString.append(input); } String clString = CleanAmpCharacter(cleanString.toString()); in.close(); Document cleanDoc = DocumentHelper.parseText(clString); // retrieve data in the nodes of cleanDoc and add to final XML doc ....... // Write the file out final XMLWriter writer = new XMLWriter(new FileWriter( sFilePath )); writer.setMaximumAllowedCharacter(-1); writer.setResolveEntityRefs(false); writer.write( docXML ); writer.close(); logger.info(sFilePath + " created for import"); //Import the new file new ImportXMLDocument(sFilePath); } catch ....{ } .... code for the clean-up private String CleanAmpCharacter(String sAmpChar) { // simple regex replacement of code // iterates through a file of codes and replacement values //Ex. sAmpChar = sAmpChar.replaceAll("°", "°"); } It may actually be the DocumentHandler that is replacing the entity values with ?. Either way, I need help to clean these more effectively or find a better solution. Thanks, Ben Cramer iPath Solutions ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2004-12-07 05:07 Message: Logged In: NO Could you provide some example code that illustrates your problem? Maarten ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=216035&aid=1080334&group_id=16035 ------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/ _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev