[ 
https://issues.apache.org/jira/browse/ANY23-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lewis John McGibbney resolved ANY23-267.
----------------------------------------
    Resolution: Fixed

Although this issue is still encountered, extractions do no longer completely 
fail. The issue here is that the DOM is not being 'fixed' before being passed 
by the Any23 RDF wrappers to the underlying Semargl parser which is very 
strict. 

> Entire extractions fail due to "The element type 'meta' must be terminated by 
> the matching end-tag </meta>"
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: ANY23-267
>                 URL: https://issues.apache.org/jira/browse/ANY23-267
>             Project: Apache Any23
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 1.1
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>             Fix For: 2.2
>
>
> WebService API call
> http://any23.org/best/twitter.com/cygri
> {code}
> Could not parse input.
> ================================================================
> ------------ BEGIN Exception context ------------
> ExtractionContext(urn:x-any23:html-rdfa11:root-extraction-result-id:https://twitter.com/cygri)
> Errors {
> }
> ------------ END   Exception context ------------
> org.apache.any23.extractor.ExtractionException: Error while parsing RDF 
> document.
>       at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:109)
>       at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
>       at 
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:463)
>       at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:255)
>       at org.apache.any23.Any23.extract(Any23.java:298)
>       at org.apache.any23.Any23.extract(Any23.java:450)
>       at 
> org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:114)
>       at org.apache.any23.servlet.Servlet.doGet(Servlet.java:79)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
>       at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:301)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>       at 
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
>       at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>       at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
>       at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
>       at 
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
>       at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
>       at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:74)
>       at 
> org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
>       at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
>       at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
>       at 
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:794)
>       at 
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:652)
>       at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1575)
>       at 
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1533)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: org.openrdf.rio.RDFParseException: org.xml.sax.SAXParseException; 
> lineNumber: 15; columnNumber: 116; The element type "meta" must be terminated 
> by the matching end-tag "</meta>".
>       at 
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:111)
>       at 
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
>       at 
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
>       ... 29 more
> Caused by: org.semarglproject.rdf.ParseException: 
> org.xml.sax.SAXParseException; lineNumber: 15; columnNumber: 116; The element 
> type "meta" must be terminated by the matching end-tag "</meta>".
>       at 
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1130)
>       at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
>       at 
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
>       at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
>       at 
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
>       at 
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)
>       ... 31 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 15; columnNumber: 116; 
> The element type "meta" must be terminated by the matching end-tag "</meta>".
>       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>       at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
>       ... 35 more
> ================================================================
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to