Thomas Francart created ANY23-291:
-------------------------------------

             Summary: JSON-LD should be looked up in entire HTML document, not 
just in <head>
                 Key: ANY23-291
                 URL: https://issues.apache.org/jira/browse/ANY23-291
             Project: Apache Any23
          Issue Type: Improvement
          Components: extractors
    Affects Versions: 1.2
            Reporter: Thomas Francart
            Priority: Minor


In 
org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.extractJSONLDScript(), 
I think this line :

List<Node> scriptNodes = DomUtils.findAll(in, "/HTML/HEAD/SCRIPT");

is too restrictive. scripts containing json-ld can be placed anywhere in the 
page, and actually some CMS/Wordpress plugin inserting JSON-LD are generating 
their output in the body, not in the head.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to