Luis Filipe Nassif created TIKA-2033:
----------------------------------------
Summary: Value attributes of input elements not extracted from
HTML
Key: TIKA-2033
URL: https://issues.apache.org/jira/browse/TIKA-2033
Project: Tika
Issue Type: Improvement
Components: parser
Affects Versions: 1.10
Environment: Windows 7, java8 x64
Reporter: Luis Filipe Nassif
Priority: Minor
The text of value attributes of input elements currently is not extracted from
HTML files. Note it is rendered by browsers. I tried using IdentityHtmlMapper
and played with HtmlSchema with no luck. Simple test HTML below:
<HTML><body><input value='text'></input></body></HTML>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)