Andrey Kutuzov created ANY23-240:
------------------------------------

             Summary: Option to process <br>s as spaces in Microdata
                 Key: ANY23-240
                 URL: https://issues.apache.org/jira/browse/ANY23-240
             Project: Apache Any23
          Issue Type: Improvement
          Components: extractors, microdata
            Reporter: Andrey Kutuzov


When extracting Microdata from html pages, any23 silently drops all html tags 
inside predicates' values. See, for example, 
http://schema.org/Recipe/ingredients at http://kuking.net/3_2070.htm.
The problem is that on this page (and many others) ingredients are separated 
from each other only with '<br>' tag. After any23 drops it, the content becomes 
mixed and unintelligible. At the same time, Google Structured Data Testing Tool 
separates them properly with spaces.
Is it possible to implement this behavior (replacing <br> tags with spaces) in 
any23 as option?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to