Thomas Francart created ANY23-291:
-------------------------------------
Summary: JSON-LD should be looked up in entire HTML document, not
just in <head>
Key: ANY23-291
URL: https://issues.apache.org/jira/browse/ANY23-291
Project: Apache Any23
Issue Type: Improvement
Components: extractors
Affects Versions: 1.2
Reporter: Thomas Francart
Priority: Minor
In
org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.extractJSONLDScript(),
I think this line :
List<Node> scriptNodes = DomUtils.findAll(in, "/HTML/HEAD/SCRIPT");
is too restrictive. scripts containing json-ld can be placed anywhere in the
page, and actually some CMS/Wordpress plugin inserting JSON-LD are generating
their output in the body, not in the head.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)