[
https://issues.apache.org/jira/browse/TIKA-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902457#action_12902457
]
Jukka Zitting commented on TIKA-497:
------------------------------------
Instead of just fixing the capitalization, I'd argue that the HTML parser
should specifically look for these kinds of well known metadata keys and
automatically map such information to the applicable Metadata constants we
already have. If there are multiple sources for a particular metadata entry
(Content-Type is a perfect example), then reasonable heuristics should be used
to merge the information.
> HtmlHandler should fix up incorrect capitalization of names in <meta
> http-equiv="xxx"> attributes before putting into metadata
> ------------------------------------------------------------------------------------------------------------------------------
>
> Key: TIKA-497
> URL: https://issues.apache.org/jira/browse/TIKA-497
> Project: Tika
> Issue Type: Improvement
> Affects Versions: 0.7
> Reporter: Ken Krugler
> Assignee: Ken Krugler
> Priority: Minor
> Fix For: 0.8
>
>
> With the current behavior, you can get metadata entries that have
> "Content-Type" and "content-type" as their names, because http-equiv
> attribute values often use incorrect capitalization.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.