[ https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381978#comment-14381978 ]
Lewis John McGibbney commented on ANY23-247: -------------------------------------------- An example of a failing test for this issue {code} org.apache.any23.Any23Test.testMicrodataSupport Failing for the past 6 builds (Since Unstable#1309 ) Took 0.43 sec. Error Message Error while parsing RDF document. Stacktrace org.apache.any23.extractor.ExtractionException: Error while parsing RDF document. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1236) at org.semarglproject.source.XmlSource.process(XmlSource.java:48) at org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) at org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109) at org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41) at org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:462) at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:254) at org.apache.any23.Any23.extract(Any23.java:298) at org.apache.any23.Any23.extract(Any23.java:433) at org.apache.any23.Any23.extract(Any23.java:347) at org.apache.any23.Any23Test.detectAndExtract(Any23Test.java:559) at org.apache.any23.Any23Test.assertExtractorActivation(Any23Test.java:590) at org.apache.any23.Any23Test.testMicrodataSupport(Any23Test.java:484) Standard Output [2015-03-26 02:01:37,665] INFO 4947[main] - org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:221) - Processing http://host.com/path Standard Error [Fatal Error] :23:15: Attribute name "itemscope" associated with an element type "div" must be followed by the ' = ' character. {code} > FIX Attribute name "itemscope" associated with an element type "html" must be > followed by the ' = ' character. > -------------------------------------------------------------------------------------------------------------- > > Key: ANY23-247 > URL: https://issues.apache.org/jira/browse/ANY23-247 > Project: Apache Any23 > Issue Type: Improvement > Affects Versions: 1.1 > Reporter: Lewis John McGibbney > Assignee: Lewis John McGibbney > Fix For: 1.3 > > > In the following markup > {code} > <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" > "http://www.w3.org/TR/html4/loose.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml" > xmlns:og="http://opengraphprotocol.org/schema/" > xmlns:fb="http://www.facebook.com/2008/fbml" version="HTML+RDFa 1.0" > xml:lang="en" itemscope itemtype="http://schema.org/Product"> > <head> > <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> > <meta http-equiv="X-UA-Compatible" content="IE=edge" /> > <meta name="generator" content="ToolTwist" /> > ... > {code} > Due to the absence of any subsequent value for *itemscope*, we get the > following error in our web server logs > {code} > [Fatal Error] :2:185: Attribute name "itemscope" associated with an element > type "html" must be followed by the ' = ' character. > {code} > Although the markup semantics are incorrect, Any23 should simply perform a > check for the itemscope value being null, if this is the case then add *=""*, > there is a precedent for us doing something like this before, I just cant > find the ticket right now! > The code we need to add is present within either > core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java > core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)