We are trying to use this method to report the exact positions of elements inside an XML document. It works fine for regular XML document, but for those XML containing specail characters like foreign characters, the returned numbers are no longer correct.
Below is the testing example I used (all the files are
in the attachments). The main process is pretty
straightforward: For each element, store its start
position and end positions within the callback
methods, and finally, read the whole XML file as a
string and print out the portion between the start and
end positions of each element. We are using the latest
jar file xercesImpl-gump-23062006.jar. Please notice
in the result that the output for elements
"CapitalGainNetIncome" and "DateAcquired" are correct,
but not for elements "PropertyDescription" and
"Return"!
Please let me know if I missed anything here. Any
response would be greatly appreciated!
Thanks,
James Zhang
***XniTest.java***
Himport java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.InputStreamReader;
import java.io.Reader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import org.apache.xerces.parsers.XMLDocumentParser;
import org.apache.xerces.xni.Augmentations;
import org.apache.xerces.xni.NamespaceContext;
import org.apache.xerces.xni.QName;
import org.apache.xerces.xni.XMLAttributes;
import org.apache.xerces.xni.XMLLocator;
import org.apache.xerces.xni.XNIException;
import org.apache.xerces.xni.parser.XMLInputSource;
import
org.apache.xerces.xni.parser.XMLParserConfiguration;
public class XniTest extends XMLDocumentParser {
static final String DEFAULT_PARSER_CONFIG =
"org.apache.xerces.parsers.XIncludeAwareParserConfiguration";
static final String NAMESPACE_PREFIXES_FEATURE_ID =
"http://xml.org/sax/features/namespace-prefixes";
protected static final String
SCHEMA_VALIDATION_FEATURE_ID =
"http://apache.org/xml/features/validation/schema";
protected static final String
HONOUR_ALL_SCHEMA_LOCATIONS_ID =
"http://apache.org/xml/features/honour-all-schemaLocations";
public static final String PATH =
"\\work\\data\\special_char.xml";
static Map startPositions = new HashMap();
static Map endPositions = new HashMap();
private XMLLocator locator;
public XniTest(XMLParserConfiguration configuration)
{
super(configuration);
}
public void startDocument(XMLLocator locator,
String encoding, NamespaceContext namespaceContext,
Augmentations augs)
throws XNIException {
this.locator = locator;
}
public void startElement(QName element,
XMLAttributes attrs, Augmentations augs)
throws XNIException {
// insert the element
startPositions.put(element.localpart,
Integer.valueOf(locator.getCharacterOffset()));
}
static public void run() {
XMLParserConfiguration parserConfig = null;
try {
parserConfig =
(XMLParserConfiguration)ObjectFactory.newInstance(DEFAULT_PARSER_CONFIG,
ObjectFactory.findClassLoader(),
true);
parserConfig.addRecognizedFeatures(new
String[] {
NAMESPACE_PREFIXES_FEATURE_ID,
});
parserConfig.setFeature(HONOUR_ALL_SCHEMA_LOCATIONS_ID,
true);
XMLDocumentParser parser = new
XniTest(parserConfig);
parser.parse(new XMLInputSource(null,
PATH, null));
String content = getDocument(PATH);
// loop through all the starting elements
for (Iterator
it=startPositions.keySet().iterator();
it.hasNext();) {
String element = (String)it.next();
int start =
((Integer)startPositions.get(element)).intValue();
int end =
((Integer)endPositions.get(element)).intValue();
System.out.println("Element:"+element);
System.out.println(content.substring(start, end));
System.out.println();
}
}
catch (Exception e) {
e.printStackTrace();
}
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
run();
}
public void endElement(QName arg0, Augmentations
arg1) throws XNIException {
endPositions.put(arg0.localpart,
Integer.valueOf(locator.getCharacterOffset()));
}
// read the content of the file into a string
static String getDocument(String path) throws
Exception {
StringBuffer doc = new StringBuffer();
//CLOB clob = new CLOB();
char buf[] = new char[10240];
int start = 0;
Reader reader = new InputStreamReader(new
FileInputStream(path));
do {
int len = reader.read(buf, 0, 10240);
doc.append(new String(buf, 0, len));
if (len < 10240) {
break;
} else {
start += len;
}
} while (true);
return doc.toString();
}
***special_char.xml***
<?xml version="1.0" encoding="UTF-8"?>
<Return>
<CapitalGainNetIncome>65774204</CapitalGainNetIncome>
<DateAcquired>1999-05-30</DateAcquired>
<PropertyDescription>¼</PropertyDescription>
</Return>
}
***Test result****
Element:PropertyDescription
¼</PropertyDescription
Element:CapitalGainNetIncome
65774204</CapitalGainNetIncome>
Element:DateAcquired
1999-05-30</DateAcquired>
Element:Return
<CapitalGainNetIncome>65774204</CapitalGainNetIncome>
<DateAcquired>1999-05-30</DateAcquired>
<PropertyDescription>¼</PropertyDescription>
</Return
__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com
ObjectFactory.java
Description: 1184072937-ObjectFactory.java
XniTest.java
Description: 1783284100-XniTest.java
SecuritySupport.java
Description: 1241980436-SecuritySupport.java
<?xml version="1.0" encoding="UTF-8"?>
<Return>
<CapitalGainNetIncome>65774204</CapitalGainNetIncome>
<DateAcquired>1999-05-30</DateAcquired>
<PropertyDescription>¼</PropertyDescription>
</Return>
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
