[
https://issues.apache.org/jira/browse/TIKA-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378358#comment-15378358
]
Ken Krugler commented on TIKA-2033:
-----------------------------------
Do you have a suggestion for how the text should appear in the resulting
document? E.g. as-is, or with "input " preceding it, or something else?
> Value attributes of input elements not extracted from HTML
> -----------------------------------------------------------
>
> Key: TIKA-2033
> URL: https://issues.apache.org/jira/browse/TIKA-2033
> Project: Tika
> Issue Type: Improvement
> Components: parser
> Affects Versions: 1.10
> Environment: Windows 7, java8 x64
> Reporter: Luis Filipe Nassif
> Priority: Minor
>
> The text of value attributes of input elements currently is not extracted
> from HTML files. Note it is rendered by browsers. I tried using
> IdentityHtmlMapper and played with HtmlSchema with no luck. Simple test HTML
> below:
> <HTML><body><input value='text'></input></body></HTML>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)