[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14381978#comment-14381978
 ] 

Lewis John McGibbney commented on ANY23-247:
--------------------------------------------

An example of a failing test for this issue
{code}
org.apache.any23.Any23Test.testMicrodataSupport
Failing for the past 6 builds (Since Unstable#1309 )
Took 0.43 sec.
Error Message

Error while parsing RDF document.

Stacktrace

org.apache.any23.extractor.ExtractionException: Error while parsing RDF 
document.
        at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1236)
        at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
        at 
org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
        at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
        at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
        at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)
        at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
        at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
        at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
        at 
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:462)
        at 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:254)
        at org.apache.any23.Any23.extract(Any23.java:298)
        at org.apache.any23.Any23.extract(Any23.java:433)
        at org.apache.any23.Any23.extract(Any23.java:347)
        at org.apache.any23.Any23Test.detectAndExtract(Any23Test.java:559)
        at 
org.apache.any23.Any23Test.assertExtractorActivation(Any23Test.java:590)
        at org.apache.any23.Any23Test.testMicrodataSupport(Any23Test.java:484)

Standard Output

[2015-03-26 02:01:37,665] INFO  4947[main] - 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:221)
 - Processing http://host.com/path
  

Standard Error

[Fatal Error] :23:15: Attribute name "itemscope" associated with an element 
type "div" must be followed by the ' = ' character.

{code}

> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: ANY23-247
>                 URL: https://issues.apache.org/jira/browse/ANY23-247
>             Project: Apache Any23
>          Issue Type: Improvement
>    Affects Versions: 1.1
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 1.3
>
>
> In the following markup
> {code}
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" 
> "http://www.w3.org/TR/html4/loose.dtd";>
> <html xmlns="http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
> <meta http-equiv="X-UA-Compatible" content="IE=edge" />
> <meta name="generator" content="ToolTwist" />
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to