[
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612633#comment-16612633
]
ASF GitHub Bot commented on ANY23-396:
--------------------------------------
Github user HansBrende commented on a diff in the pull request:
https://github.com/apache/any23/pull/121#discussion_r217159830
--- Diff:
core/src/main/java/org/apache/any23/writer/BufferedTripleHandler.java ---
@@ -0,0 +1,161 @@
+package org.apache.any23.writer;
+
+import com.google.common.base.Throwables;
+import org.apache.any23.extractor.ExtractionContext;
+import org.eclipse.rdf4j.model.IRI;
+import org.eclipse.rdf4j.model.Model;
+import org.eclipse.rdf4j.model.Resource;
+import org.eclipse.rdf4j.model.Value;
+import org.eclipse.rdf4j.model.impl.LinkedHashModelFactory;
+import org.eclipse.rdf4j.model.impl.TreeModelFactory;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Map;
+import java.util.Stack;
+import java.util.TreeMap;
+
+/**
+ * Collects all statements until end document.
+ *
+ * All statements are kept within {@link Model}.
+ *
+ * @author Jacek Grzebyta ([email protected])
+ */
+public class BufferedTripleHandler implements TripleHandler {
+
+ private static final Logger log =
LoggerFactory.getLogger(BufferedTripleHandler.class);
+ private TripleHandler underlying;
+ private static boolean isDocumentFinish = false;
+
+ private static class ContextHandler {
+ ContextHandler(ExtractionContext ctx, Model m) {
+ extractionContext = ctx;
+ extractionModel = m;
+ }
+ ExtractionContext extractionContext;
+ Model extractionModel;
+ }
+
+ private static class WorkflowContext {
+ WorkflowContext(TripleHandler underlying) {
+ this.rootHandler = underlying;
+ }
+
+
+ Stack<String> extractors = new Stack<>();
+ Map<String, ContextHandler> modelMap = new TreeMap<>();
+ IRI documentIRI = null;
+ TripleHandler rootHandler ;
+ }
+
+ public BufferedTripleHandler(TripleHandler underlying) {
+ this.underlying = underlying;
+
+ // hide model in the thread
+ WorkflowContext wc = new WorkflowContext(underlying);
+ BufferedTripleHandler.workflowContext.set(wc);
+ }
+
+ private static final ThreadLocal<WorkflowContext> workflowContext =
new ThreadLocal<>();
+
+ /**
+ * Returns model which contains all other models.
+ * @return
+ */
+ public static Model getModel() {
+ return
BufferedTripleHandler.workflowContext.get().modelMap.values().stream()
+ .map(ch -> ch.extractionModel)
+ .reduce(new LinkedHashModelFactory().createEmptyModel(),
(mf, exm) -> {
+ mf.addAll(exm);
+ return mf;
+ });
+ }
+
+ @Override
+ public void startDocument(IRI documentIRI) throws
TripleHandlerException {
+ BufferedTripleHandler.workflowContext.get().documentIRI =
documentIRI;
+ }
+
+ @Override
+ public void openContext(ExtractionContext context) throws
TripleHandlerException {
+ //
+ }
+
+ @Override
+ public void receiveTriple(Resource s, IRI p, Value o, IRI g,
ExtractionContext context) throws TripleHandlerException {
+ getModelForContext(context).add(s,p,o,g);
+ }
+
+ @Override
+ public void receiveNamespace(String prefix, String uri,
ExtractionContext context) throws TripleHandlerException {
+ getModelForContext(context).setNamespace(prefix, uri);
+ }
+
+ @Override
+ public void closeContext(ExtractionContext context) throws
TripleHandlerException {
+ //
+ }
+
+ @Override
+ public void endDocument(IRI documentIRI) throws TripleHandlerException
{
+ BufferedTripleHandler.isDocumentFinish = true;
+ }
+
+ @Override
+ public void setContentLength(long contentLength) {
+ underlying.setContentLength(contentLength);
+ }
+
+ @Override
+ public void close() throws TripleHandlerException {
+ underlying.close();
+ }
+
+ /**
+ * Releases content of the model into underlying writer.
+ */
+ public static void releaseModel() throws TripleHandlerException {
+ if(!BufferedTripleHandler.isDocumentFinish) {
+ throw new RuntimeException("Before releasing document should
be finished.");
+ }
+
+ WorkflowContext workflowContext =
BufferedTripleHandler.workflowContext.get();
+
+ String lastExtractor = ((Stack<String>)
workflowContext.extractors).peek();
--- End diff --
@jgrzebyta IMHO, it would be vastly more straightforward to simply have the
user extend the
[`CompositeTripleHandler`](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/writer/CompositeTripleHandler.java)
class to filter and transform triples into a domain-specific rdf graph of
their choosing, before delegating the final domain-specific triple outputs to
the wrapped `TripleHandler` instance(s) by calling `super.receiveTriple(
[modified triple] )`.
(Analogous in concept to Java's own
[`FilterOutputStream`](https://docs.oracle.com/javase/8/docs/api/java/io/FilterOutputStream.html)
class.)
> Add ability to run extractors in flow
> -------------------------------------
>
> Key: ANY23-396
> URL: https://issues.apache.org/jira/browse/ANY23-396
> Project: Apache Any23
> Issue Type: Improvement
> Components: core
> Affects Versions: 2.2
> Reporter: Jacek Grzebyta
> Assignee: Jacek Grzebyta
> Priority: Minor
>
> Currently extractors do not work in flows. I.E. Next extractor has no any
> access to triples made by previous one.
> It would be useful if an extractor has possibility to modify triples created
> by another extractor.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)