Lewis John McGibbney created NUTCH-2938:
-------------------------------------------
Summary: Use Any23's RepositoryWriter to write structured data to
Rdf4j repository
Key: NUTCH-2938
URL: https://issues.apache.org/jira/browse/NUTCH-2938
Project: Nutch
Issue Type: Improvement
Components: any23, plugin
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
Fix For: 1.19
I have been running a patch which leverages [Any23's
RepositoryWriter|https://any23.apache.org/apidocs/org/apache/any23/writer/RepositoryWriter.html]
(implemented as one of a number of TripleHandler's via
[CompositeTripleHandler|https://any23.apache.org/apidocs/org/apache/any23/writer/CompositeTripleHandler.html])
to write Any23 extractions to
[GraphDB|https://www.ontotext.com/products/graphdb/]. This enables us to build
a content graph from data across the enterprise.
This feature is turned off by default so will not change existing Any23
behaviour. I have concerns about the performance of this patch because right
now we need to create a new repository connection for each URL. This is not
great so I will definitely improve on it.
PR coming up.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)