[
https://issues.apache.org/jira/browse/ANY23-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lewis John McGibbney resolved ANY23-267.
----------------------------------------
Resolution: Fixed
Although this issue is still encountered, extractions do no longer completely
fail. The issue here is that the DOM is not being 'fixed' before being passed
by the Any23 RDF wrappers to the underlying Semargl parser which is very
strict.
> Entire extractions fail due to "The element type 'meta' must be terminated by
> the matching end-tag </meta>"
> -----------------------------------------------------------------------------------------------------------
>
> Key: ANY23-267
> URL: https://issues.apache.org/jira/browse/ANY23-267
> Project: Apache Any23
> Issue Type: Sub-task
> Components: core
> Affects Versions: 1.1
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 2.2
>
>
> WebService API call
> http://any23.org/best/twitter.com/cygri
> {code}
> Could not parse input.
> ================================================================
> ------------ BEGIN Exception context ------------
> ExtractionContext(urn:x-any23:html-rdfa11:root-extraction-result-id:https://twitter.com/cygri)
> Errors {
> }
> ------------ END Exception context ------------
> org.apache.any23.extractor.ExtractionException: Error while parsing RDF
> document.
> at
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:109)
> at
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
> at
> org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:463)
> at
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:255)
> at org.apache.any23.Any23.extract(Any23.java:298)
> at org.apache.any23.Any23.extract(Any23.java:450)
> at
> org.apache.any23.servlet.WebResponder.runExtraction(WebResponder.java:114)
> at org.apache.any23.servlet.Servlet.doGet(Servlet.java:79)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:618)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:725)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:301)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:239)
> at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:219)
> at
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:106)
> at
> org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:503)
> at
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:136)
> at
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:74)
> at
> org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:610)
> at
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:88)
> at
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:526)
> at
> org.apache.coyote.ajp.AbstractAjpProcessor.process(AbstractAjpProcessor.java:794)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:652)
> at
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1575)
> at
> org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:1533)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.openrdf.rio.RDFParseException: org.xml.sax.SAXParseException;
> lineNumber: 15; columnNumber: 116; The element type "meta" must be terminated
> by the matching end-tag "</meta>".
> at
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:111)
> at
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
> at
> org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
> ... 29 more
> Caused by: org.semarglproject.rdf.ParseException:
> org.xml.sax.SAXParseException; lineNumber: 15; columnNumber: 116; The element
> type "meta" must be terminated by the matching end-tag "</meta>".
> at
> org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1130)
> at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
> at
> org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
> at
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
> at
> org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
> at
> org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)
> ... 31 more
> Caused by: org.xml.sax.SAXParseException; lineNumber: 15; columnNumber: 116;
> The element type "meta" must be terminated by the matching end-tag "</meta>".
> at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
> at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
> ... 35 more
> ================================================================
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)