After some further experimentation, I determined that the character encoding of the text from my JTextArea is Cp1252 and not UTF-8.  Is there some simple way to set the encoding of a StyledDocument used in a JTextArea?
 
I'm assuming that character encoding is the source of my problem here, it may not be the only issue.
 
Rob
----- Original Message -----
Sent: Monday, February 18, 2002 8:23 PM
Subject: [dom4j-user] special entities and JAXP

While the application I'm writing utilizes dom4j, this question isn't dom4j specific.  I offer it up to the list simply because many of you are proficient with Java and JAXP.... I suspect this is a very simple problem, but one I can't quite decipher yet.  I apologize if the question is misplaced....
 
I have written a simple editor which utilizes a JTextArea to hold and visualize text.  I have written a simple method to check for well-formedness (you try to come up with a better adjective) of an XML document.  I am using the JAXP classes in conjunction with jdk 1.4.
 
The text within the JTextArea is extracted using a getDocument().getText() method to obtain a String object. 
 
Well-formedness is determine in the following manner:
 
    public void wellform(String text) throws IOException, SAXException, ParserConfigurationException {
        SAXParser parser = factory.newSAXParser();
        StringReader reader = new StringReader(text);
        parser.parse(new InputSource(reader), new XMLHandler(holder));
    }
   
    private static SAXParserFactory factory = SAXParserFactory.newInstance();
XMLHandler is a class I've written to catch certain thrown exceptions.  It extends the DefaultHandler adapter just to simplify matters.
 
Here's what I don't understand with regards to my application:
 
If I write a simple one line XML document, say:
 
<a>testing</a>
 
everything works just fine and dandy... the document is declared well-formed, and life goes merrily on its way.
 
However, if I add a second line directly underneath the first line, any line at all, like:
 
<b>testing</b>
 
I get back an exception saying:
 
Illegal character at end of document, &#x3c;.
Line 2, column 0.
 
Now, I know that what apparently is happening, is that the less than character is being interpreted as a special entity.  But what I don't understand is 1) why does it only affect the second line of text, not the first (is there a carriage return or something that JTextArea sticks in?) ? 
 
Is there some feature that I'm supposed to specify for the xmlreader, or saxparser to avoid this problem?  Are there hidden characters coming from the JTextArea I should be aware of, or is there some method I need to modify in my custom ContentHandler (XMLHandler) to overcome this issue???
 
Hope this makes sense to some of you,
 
Rob
 
 
 

Reply via email to