JTidy is sometimes to intelligent ;-) It tries to fix to much. Have a look into the HTML DTD and see, whether <script> is allowed in <table>. If yes, post a bug at JTidy SourceForge, otherwise the behaviour of JTidy is ok. We encountered many similar problems with JTidy.
In your case JTidy gets especially confused by <tr> and <td> in the script. Maybe you must fix these pages by hand. Does CDATA exist in HTML?? If yes, maybe this helps.
Regards,
Joerg
Anna Afonchenko wrote:
Hi all. I use an HTMLGenerator to tidy up the pages that I load, and I encountered a very strange behaviour concerning scripts. This is my input file:
test.html
<html>
<head>
<title>Testing JTidy page</title>
</head>
<body>
<p>This is test</p>
<table>
<tr>
<td>Hello world</td>
</tr>
<script language="JavaScript">
document.write('<tr>');
document.write('<td>');
document.write('testing the JavaScript');
document.write('</td>');
document.write('</tr>');
</script>
<tr>
<td>After script</td>
</tr>
</table>
</body>
</html>
As you can notice, the script tag is not inside the tr/td tag, but it writes them, so the result table contains three rows (one of them output by the script).
This is the actual code that I took from somebody's page.
When I put this page into the pipeline, using HTMLGenerator (to tidy it), this is the VERY weird result that I get:
pipeline:
<map:match pattern="test">
<map:generate src="test.html" type="html"/>
<map:serialize type="xml"/>
</map:match>
the result shown in the Cocoon browser window:
<?xml version="1.0" encoding="utf-8" ?>
**<html>
<head>
* * <title>Testing JTidy page</title>
* *</head>
<body>
* * <p>*This is test*</p>
* * <script language="*JavaScript*" type="*text/javascript*" />
* * *document.write(''); document.write(''); document.write(''); *
<table>
<tr>
* * <td>*Hello world*</td>
* * </tr>
<tr>
* * <td>*'); document.write('testing the JavaScript'); document.write('*</td>
* * </tr>
* * </table>
<table>
<tr>
* * <td>*After script*</td>
* * </tr>
* * </table>
* *</body>
</html>
The JTidy took out the script and messed the table!
Somebody encountered such behaviour when using HTMLGenerator?
I know that this is not really related to the Cocoon, but Cocoon uses JTidy, so I thought that somebody may have dealt with this thing already.
Also, I looked on the JTidy page on sourceforge, but I didn't find anything related to this.
Please, I somebody understands what going on with this JTidy feature, please help me.
Sorry for a not-so-related question.
Thank you very much for help.
Anna
--------------------------------------------------------------------- Please check that your question has not already been answered in the FAQ before posting. <http://xml.apache.org/cocoon/faq/index.html> To unsubscribe, e-mail: <[EMAIL PROTECTED]> For additional commands, e-mail: <[EMAIL PROTECTED]>