[
https://issues.apache.org/jira/browse/TIKA-457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899020#action_12899020
]
Ken Krugler commented on TIKA-457:
----------------------------------
Just applied patch (SVN 986089) to problem that showed up during testing on
larger dataset. Empty value in Metadata was getting emitted as <meta> tag with
empty content=xxx attribute, which can cause SAX processing code to throw a
NPE.
> HTMLParser gets an early </body> event
> --------------------------------------
>
> Key: TIKA-457
> URL: https://issues.apache.org/jira/browse/TIKA-457
> Project: Tika
> Issue Type: Bug
> Components: parser
> Reporter: Julien Nioche
> Assignee: Ken Krugler
> Fix For: 0.8
>
> Attachments: TIKA-457.patch
>
>
> I am using the IdentityMapper in the HTMLparser with this simple document:
> {code}
> <html><head><title> my title </title>
> </head>
> <body>
> <frameset rows=\"20,*\">
> <frame src=\"top.html\">
> </frame>
> <frameset cols=\"20,*\">
> <frame src=\"left.html\">
> </frame>
> <frame src=\"invalid.html\"/>
> </frame>
> <frame src=\"right.html\">
> </frame>
> </frameset>
> </frameset>
> </body></html>
> {code}
> Strangely the HTMLHandler is getting a call to endElement on the body
> *BEFORE* we reach frameset. As a result the variable bodylevel is
> decremented back to 0 and the remaining entities are ignored due to the logic
> implemented in HTMLHandler.
> Any idea?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.