[
https://issues.apache.org/jira/browse/ANY23-314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16288332#comment-16288332
]
ASF GitHub Bot commented on ANY23-314:
--------------------------------------
GitHub user lewismc opened a pull request:
https://github.com/apache/any23/pull/49
ANY23-314 Service fails to return extraction in case of extraction error
This issue primarily addresses
https://issues.apache.org/jira/browse/ANY23-314
Additional changes are trivial code cleanups.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/lewismc/any23 ANY23-314
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/any23/pull/49.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #49
----
commit d0e627a957c6ba5ec59ff40ba5a73cf7e52dd1d4
Author: Lewis John McGibbney <[email protected]>
Date: 2017-12-12T21:51:48Z
ANY23-314 Service fails to return extraction in case of extraction error
----
> Service fails to return extraction in case of extraction error
> --------------------------------------------------------------
>
> Key: ANY23-314
> URL: https://issues.apache.org/jira/browse/ANY23-314
> Project: Apache Any23
> Issue Type: Bug
> Components: service
> Affects Versions: 2.1
> Environment: Any23 2.2-SNAPSHOT
> Reporter: Lewis John McGibbney
> Assignee: Lewis John McGibbney
> Fix For: 2.2
>
> Attachments: extraction.json, output.log
>
>
> See the following command line extraction
> {code}
> lmcgibbn@LMC-056430 /usr/local/any23(master) $
> ./cli/target/appassembler/bin/any23 rover -l output.log -o extraction.json
> https://www.jobcluster.de
> ------------------------------------------------------------------------
> Apache Any23 :: rover
> ------------------------------------------------------------------------
> 0 [main] WARN org.apache.tika.parser.image.ImageParser -
> JBIG2ImageReader not loaded. jbig2 files will be ignored
> 128 [main] INFO org.apache.any23.rdf.PopularPrefixes - Loading prefixes
> from /org/apache/any23/prefixes/prefixes.properties
> 1388 [main] WARN org.apache.commons.httpclient.HttpMethodBase - Going to
> buffer response body of large or unknown size. Using getResponseBodyAsStream
> instead is recommended.
> 4790 [main] INFO org.apache.any23.extractor.SingleDocumentExtraction -
> Processing https://www.jobcluster.de/
> [Fatal Error] :12:46: The entity name must immediately follow the '&' in the
> entity reference.
> ------------------------------------------------------------------------
> Apache Any23 FAILURE
> Execution terminated with errors: Error while parsing RDF document.
> Total time: 5s
> Finished at: Tue Dec 12 08:01:14 PST 2017
> Final Memory: 31M/184M
> ------------------------------------------------------------------------
> {code}
> This results in the attached extraction result (extraction.json) and
> associated log (output.log)
> If I attempt to run the same extraction using the service at
> [any23.org|http://any23.org/any23/?format=json&uri=https%3A%2F%2Fwww.jobcluster.de%2F&validation-mode=none]
> the (partial) extraction result should be returned regardless of whether the
> entire extraction was successful or not.
> The service servlet seems to be returning the extraction Exception as oppose
> to the preferred extraction result. This issue will fix that.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)