On Friday 25 June 2004 16:21, you wrote:
> However, I was unable to reproduce your problem (I tried it with the
> current CVS version of dom4j and with 1.5 beta1).
I use 1.5b2. I know that it worked previously but I'm not sure whether it was
1.4 or 1.5b1.
You need cyberneko (or some other tolerant HTML sax parser):
import org.cyberneko.html.parsers.SAXParser;
public static void main( String[] args )
{
try
{
SAXReader reader = new SAXReader( new SAXParser() );
// some arbitrary spiegel.de aricle
URL url = new
URL("http://www.spiegel.de/politik/deutschland/0,1518,305821,00.html");
Document doc = reader.read( new
InputStreamReader( ((HttpURLConnection)url.openConnection()).getInputStream() ) );
final Node summary = doc.selectSingleNode( "/HTML/HEAD/[EMAIL
PROTECTED] eq
'description']/@content" );
// empty
System.out.println(summary.getText());
// correct
Element e = (Element)summary;
System.out.println(e.attribute("content").getText());
}
catch( Exception e )
{
System.err.println( e );
e.printStackTrace();
}
}
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user