[
https://issues.apache.org/jira/browse/ANY23-326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16337280#comment-16337280
]
Hans Brende commented on ANY23-326:
-----------------------------------
Just for reference, here is the actual stack trace:
org.eclipse.rdf4j.rio.RDFParseException: org.xml.sax.SAXParseException;
lineNumber: 170; columnNumber: 3; The element type "input" must be terminated
by the matching end-tag "</input>".
at
org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:111)
at
org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:95)
at
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:117)
at
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:47)
at
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:473)
at
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:261)
at org.apache.any23.Any23.extract(Any23.java:300)
at org.apache.any23.Any23.extract(Any23.java:452)
at org.apache.any23.cli.Rover.performExtraction(Rover.java:182)
...
Caused by: org.semarglproject.rdf.ParseException:
org.xml.sax.SAXParseException; lineNumber: 170; columnNumber: 3; The element
type "input" must be terminated by the matching end-tag "</input>".
at
org.semarglproject.rdf.rdfa.RdfaParser.processException(RdfaParser.java:1141)
at org.semarglproject.source.XmlSource.process(XmlSource.java:50)
at
org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
at
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
at
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
at
org.semarglproject.rdf4j.rdf.rdfa.RDF4JRDFaParser.parse(RDF4JRDFaParser.java:109)
... 10 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 170; columnNumber: 3; The
element type "input" must be terminated by the matching end-tag "</input>".
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
... 14 more
> parsing unclosed meta and input tags fails
> ------------------------------------------
>
> Key: ANY23-326
> URL: https://issues.apache.org/jira/browse/ANY23-326
> Project: Apache Any23
> Issue Type: Bug
> Components: CLI
> Affects Versions: 2.1
> Environment: ubuntu 17.04
> Reporter: Ben Roberts
> Priority: Major
> Fix For: 2.2
>
>
> parsing fails as soon as it hits an unclosed input or meta tag, as an example
> try
> ./bin/any23 rover https://ben.thatmustbe.me/note/2017/12/28/1
> [Fatal Error] :170:3: The element type "input" must be terminated by the
> matching end-tag "</input>".
>
> It seems like the issue might be that this is using a very old version of
> jsoup. at least as best I could tell.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)