[ https://issues.apache.org/jira/browse/ANY23-291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16358925#comment-16358925 ]
ASF GitHub Bot commented on ANY23-291: -------------------------------------- Github user HansBrende commented on the issue: https://github.com/apache/any23/pull/60 @ferrerod I am unable to reproduce your error. Running `mvn clean install` on the master branch gives me: ``` [INFO] Apache Any23 ....................................... SUCCESS [ 6.020 s] [INFO] Apache Any23 :: Base API ........................... SUCCESS [ 3.684 s] [INFO] Apache Any23 :: Test Resources ..................... SUCCESS [ 0.818 s] [INFO] Apache Any23 :: CSV Utilities ...................... SUCCESS [ 0.612 s] [INFO] Apache Any23 :: Mime Type Detection ................ SUCCESS [ 5.078 s] [INFO] Apache Any23 :: Encoding Detection ................. SUCCESS [ 1.730 s] [INFO] Apache Any23 :: Core ............................... SUCCESS [ 39.300 s] [INFO] Apache Any23 :: Plugins :: Office Scraper .......... SUCCESS [ 7.402 s] [INFO] Apache Any23 :: Plugins :: HTML Scraper ............ SUCCESS [ 2.761 s] [INFO] Apache Any23 :: CLI ................................ SUCCESS [ 23.814 s] [INFO] Apache Any23 :: OpenIE ............................. SUCCESS [ 2.224 s] [INFO] Apache Any23 :: Plugins :: Basic Crawler ........... SUCCESS [ 38.897 s] [INFO] Apache Any23 :: Plugins :: Integration Test ........ SUCCESS [01:18 min] [INFO] Apache Any23 :: Service ............................ SUCCESS [ 32.645 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 04:03 min [INFO] Finished at: 2018-02-09T14:35:24-06:00 [INFO] Final Memory: 79M/898M ``` > JSON-LD should be looked up in entire HTML document, not just in <head> > ----------------------------------------------------------------------- > > Key: ANY23-291 > URL: https://issues.apache.org/jira/browse/ANY23-291 > Project: Apache Any23 > Issue Type: Improvement > Components: extractors > Affects Versions: 1.2 > Reporter: Thomas Francart > Assignee: Hans Brende > Priority: Minor > Fix For: 2.2 > > Attachments: example-embedded-jsonld.html > > > In > org.apache.any23.extractor.html.EmbeddedJSONLDExtractor.extractJSONLDScript(), > I think this line : > List<Node> scriptNodes = DomUtils.findAll(in, "/HTML/HEAD/SCRIPT"); > is too restrictive. scripts containing json-ld can be placed anywhere in the > page, and actually some CMS/Wordpress plugin inserting JSON-LD are generating > their output in the body, not in the head. -- This message was sent by Atlassian JIRA (v7.6.3#76005)