On Friday 25 June 2004 16:21, you wrote:
> However, I was unable to reproduce your problem (I tried it with the
> current CVS version of dom4j and with 1.5 beta1).

I use 1.5b2. I know that it worked previously but I'm not sure whether it was 
1.4 or 1.5b1. 

You need cyberneko (or some other tolerant HTML sax parser):

import org.cyberneko.html.parsers.SAXParser;

        public static void main( String[] args )
        {
                try
                {
                        SAXReader reader = new SAXReader( new SAXParser() );

                        // some arbitrary spiegel.de aricle 
                        URL url = new 
URL("http://www.spiegel.de/politik/deutschland/0,1518,305821,00.html";);

                        Document doc = reader.read( new 
InputStreamReader( ((HttpURLConnection)url.openConnection()).getInputStream() ) );     
                 

                        final Node summary = doc.selectSingleNode( "/HTML/HEAD/[EMAIL 
PROTECTED] eq 
'description']/@content" );

                        // empty
                        System.out.println(summary.getText());

                        // correct
                        Element e = (Element)summary;
                        System.out.println(e.attribute("content").getText());
                }
                catch( Exception e )
                {
                        System.err.println( e );
                        e.printStackTrace();
                }
        }


-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to