Andrey Kutuzov created ANY23-240:
------------------------------------
Summary: Option to process <br>s as spaces in Microdata
Key: ANY23-240
URL: https://issues.apache.org/jira/browse/ANY23-240
Project: Apache Any23
Issue Type: Improvement
Components: extractors, microdata
Reporter: Andrey Kutuzov
When extracting Microdata from html pages, any23 silently drops all html tags
inside predicates' values. See, for example,
http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm.
The problem is that on this page (and many others) ingredients are separated
from each other only with '<br>' tag. After any23 drops it, the content becomes
mixed and unintelligible. At the same time, Google Structured Data Testing Tool
separates them properly with spaces.
Is it possible to implement this behavior (replacing <br> tags with spaces) in
any23 as option?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)