[
https://issues.apache.org/jira/browse/LUCENE-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13484332#comment-13484332
]
Robert Muir commented on LUCENE-4505:
-------------------------------------
Here is the commandline equivalent: say i screw up our lucene/docs/index.html
and add a unclosed bold tag in the getting started paragraph,
and a bogus tag at the end:
{noformat}
rmuir@beast:~/workspace/lucene-trunk/lucene/build/docs$ java -jar
~/Downloads/jtidy-r938.jar -e -q index.html
line 1 column 1 - Warning: missing <!DOCTYPE> declaration
line 24 column 62 - Warning: missing </b> before </p>
line 27 column 1 - Warning: inserting implicit <b>
line 111 column 1 - Error: <dfdsfdsf> is not recognized!
line 111 column 1 - Warning: content occurs after end of body
line 111 column 1 - Warning: discarding unexpected <dfdsfdsf>
line 112 column -3 - Warning: content occurs after end of body
line 112 column -3 - Warning: discarding unexpected </dfdsfdsf>
{noformat}
Basically we want to fail if there is any output like this at all. Note only
one of the problems is an error!
The "Warnings" are also bogus things we should fix.
NOTE: there are some "false" warnings that are bugs in 'javadocs itself', but
it seems we could just filter those out:
{noformat}
rmuir@beast:~/workspace/lucene-trunk/lucene/build/docs$ java -jar
~/Downloads/jtidy-r938.jar -e -q core/deprecated-list.html
line 152 column 20 - Warning: <a> escaping malformed URI reference
{noformat}
Thats because javadoc generates bogus urls like <a
href="org/apache/lucene/search/FuzzyQuery.html#floatToEdits(float, int)">
instead of escaping with %20...
> improve jtidy javadocs check
> ----------------------------
>
> Key: LUCENE-4505
> URL: https://issues.apache.org/jira/browse/LUCENE-4505
> Project: Lucene - Core
> Issue Type: Task
> Reporter: Robert Muir
>
> Currently we are using the ant task
> (http://sourceforge.net/p/jtidy/code/1261/tree/trunk/jtidy/src/main/java/org/w3c/tidy/ant/JTidyTask.java)
> built into jtidy itself.
> This has a number of disadvantages:
> * at least in the version we are using, creates a ByteArrayDataOutput that
> hides all the output. So if there is an error, its no good.
> * requires creation of a temp directory: even though we disable the actual
> output with a parameter, this means it creates thousands of 0 byte files
> We only pass 3 options to tidy today:
> * input-encoding=UTF-8
> * only-errors=true
> * show-warnings=false <-- this one is a OOM hack.
> Ideally i think we would:
> * pass input-encoding=UTF-8, only-errors=true, quiet=true.
> * send all output to a single file or property.
> * if this contains any contents, fail and print the contents.
> This would mean we would fail on warnings too (I checked, this is a good
> thing, there would be some things to fix).
> So as a start we could just set show-warnings=false temporarily so we only
> fail on errors like today.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]