I'm working on a project where I'm filtering HTML documents with JTidy, and
then modifying the resulting output using dom4j.

One of the things I've discovered is that the HTML text often contains
quoted text.  I utilized the setquotemarks() method to insure that the
quotes are saved out as " entities, but once they're read in by dom4j,
text seems to get lost, and I don't see any instances of " anymore.
Ampersands are another matter; they seem to remain just fine (there are tons
of these in the documnets as well).

I'm a bit baffled about what is happening to the text.. can someone explain
what I need to do to preserve the entities???

Rob


_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to