HTML tags produce undefined behavior on the TWiki parser
--------------------------------------------------------

                 Key: DOXIA-441
                 URL: https://jira.codehaus.org/browse/DOXIA-441
             Project: Maven Doxia
          Issue Type: Bug
          Components: Module - Twiki
    Affects Versions: 1.1.4
         Environment: RHEL 5.5, java 1.6.0_20
            Reporter: Rodrigo Tobar
         Attachments: TWikiParserTest.java

I'm using the TWiki parser in conjunction with a sink to format some twiki 
text. When putting some html tags in the code, the parser produces invalid 
output. I found this bug while working with a home-brewed sink, but later I 
tried with other sinks and it was also the case, which pointed out that the 
fault is actually in the parser. Actually the test case I'm attaching is using 
a XhtmlBaseSink sink.

The fault seems to be in org.apache.maven.doxia.module.twiki.parser.TextParser. 
I see one of two possibilities (but I don't have the time to produce a patch, 
and I prefer just to explain my findings):

 * Fix the HTML_TAG_PATTERN pattern, since it is detecting, in the example, the 
whole " and a bit of <font color=\"red\">red</font>" string, instead of just 
"<font color=\"red\">red</font>"

 * If that's not possible, then the pattern compiled in line 117/118 should be 
changed to take into account the content before the HTML tag, so it would be 
"(.+)?(\\<" + tag + ".*\\>)(.*)?(\\<\\/" + tag + "\\>)(.*)?" (the difference is 
the initial "(.+)?"). The logic with the group numbers should be changed too

 * Other solution is to take into account the restul of xhtmlMatcher.start(1) 
in TextParser#parseXHTML:331, so it realizes that there is normal text before 
the actual tag.

Please point out if this is really a bug in the TWiki parser, or if I'm simply 
doing something wrong. I couldn't find any reference in the mailing lists or 
whatsoever, and I'm inclined to see this as a bug; therefore, I'm opening this 
ticket.

Cheers

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to