[jira] [Commented] (ANY23-396) Add ability to run extractors in flow

ASF GitHub Bot (JIRA) Wed, 12 Sep 2018 11:31:16 -0700


    [ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612563#comment-16612563
 ]


ASF GitHub Bot commented on ANY23-396:
--------------------------------------

Github user jgrzebyta commented on a diff in the pull request:

    https://github.com/apache/any23/pull/121#discussion_r217142024
  
    --- Diff: 
core/src/main/java/org/apache/any23/writer/BufferedTripleHandler.java ---
    @@ -0,0 +1,161 @@
    +package org.apache.any23.writer;
    +
    +import com.google.common.base.Throwables;
    +import org.apache.any23.extractor.ExtractionContext;
    +import org.eclipse.rdf4j.model.IRI;
    +import org.eclipse.rdf4j.model.Model;
    +import org.eclipse.rdf4j.model.Resource;
    +import org.eclipse.rdf4j.model.Value;
    +import org.eclipse.rdf4j.model.impl.LinkedHashModelFactory;
    +import org.eclipse.rdf4j.model.impl.TreeModelFactory;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.util.Map;
    +import java.util.Stack;
    +import java.util.TreeMap;
    +
    +/**
    + * Collects all statements until end document.
    + *
    + * All statements are kept within {@link Model}.
    + *
    + * @author Jacek Grzebyta ([email protected])
    + */
    +public class BufferedTripleHandler implements TripleHandler {
    +
    +    private static final Logger log = 
LoggerFactory.getLogger(BufferedTripleHandler.class);
    +    private TripleHandler underlying;
    +    private static boolean isDocumentFinish = false;
    +
    +    private static class ContextHandler {
    +        ContextHandler(ExtractionContext ctx, Model m) {
    +            extractionContext = ctx;
    +            extractionModel = m;
    +        }
    +        ExtractionContext extractionContext;
    +        Model extractionModel;
    +    }
    +
    +    private static class WorkflowContext {
    +        WorkflowContext(TripleHandler underlying) {
    +            this.rootHandler = underlying;
    +        }
    +
    +
    +        Stack<String> extractors = new Stack<>();
    +        Map<String, ContextHandler> modelMap = new TreeMap<>();
    +        IRI documentIRI = null;
    +        TripleHandler rootHandler ;
    +    }
    +
    +    public BufferedTripleHandler(TripleHandler underlying) {
    +        this.underlying = underlying;
    +
    +        // hide model in the thread
    +        WorkflowContext wc = new WorkflowContext(underlying);
    +        BufferedTripleHandler.workflowContext.set(wc);
    +    }
    +
    +    private static final ThreadLocal<WorkflowContext> workflowContext = 
new ThreadLocal<>();
    --- End diff --
    
    Yes I agree with you. The idea is that these models should be presented 
later on (inside SingleDocumentWriter) to ModelExtractor and conrain parsing 
outcome of previous extractors. Access to those models is not propagated 
further down from Rover. I meant without changing api. I thought adding them 
into extraction parameters.


> Add ability to run extractors in flow
> -------------------------------------
>
>                 Key: ANY23-396
>                 URL: https://issues.apache.org/jira/browse/ANY23-396
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.2
>            Reporter: Jacek Grzebyta
>            Assignee: Jacek Grzebyta
>            Priority: Minor
>
> Currently extractors do not work in flows. I.E. Next extractor has no any 
> access to triples made by previous one.
> It would be useful if an extractor has possibility to modify triples created 
> by another extractor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ANY23-396) Add ability to run extractors in flow

Reply via email to