[jira] [Commented] (ANY23-396) Add ability to run extractors in flow

ASF GitHub Bot (JIRA) Wed, 12 Sep 2018 12:30:45 -0700


    [ 
https://issues.apache.org/jira/browse/ANY23-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16612633#comment-16612633
 ]


ASF GitHub Bot commented on ANY23-396:
--------------------------------------

Github user HansBrende commented on a diff in the pull request:

    https://github.com/apache/any23/pull/121#discussion_r217159830
  
    --- Diff: 
core/src/main/java/org/apache/any23/writer/BufferedTripleHandler.java ---
    @@ -0,0 +1,161 @@
    +package org.apache.any23.writer;
    +
    +import com.google.common.base.Throwables;
    +import org.apache.any23.extractor.ExtractionContext;
    +import org.eclipse.rdf4j.model.IRI;
    +import org.eclipse.rdf4j.model.Model;
    +import org.eclipse.rdf4j.model.Resource;
    +import org.eclipse.rdf4j.model.Value;
    +import org.eclipse.rdf4j.model.impl.LinkedHashModelFactory;
    +import org.eclipse.rdf4j.model.impl.TreeModelFactory;
    +import org.slf4j.Logger;
    +import org.slf4j.LoggerFactory;
    +
    +import java.util.Map;
    +import java.util.Stack;
    +import java.util.TreeMap;
    +
    +/**
    + * Collects all statements until end document.
    + *
    + * All statements are kept within {@link Model}.
    + *
    + * @author Jacek Grzebyta ([email protected])
    + */
    +public class BufferedTripleHandler implements TripleHandler {
    +
    +    private static final Logger log = 
LoggerFactory.getLogger(BufferedTripleHandler.class);
    +    private TripleHandler underlying;
    +    private static boolean isDocumentFinish = false;
    +
    +    private static class ContextHandler {
    +        ContextHandler(ExtractionContext ctx, Model m) {
    +            extractionContext = ctx;
    +            extractionModel = m;
    +        }
    +        ExtractionContext extractionContext;
    +        Model extractionModel;
    +    }
    +
    +    private static class WorkflowContext {
    +        WorkflowContext(TripleHandler underlying) {
    +            this.rootHandler = underlying;
    +        }
    +
    +
    +        Stack<String> extractors = new Stack<>();
    +        Map<String, ContextHandler> modelMap = new TreeMap<>();
    +        IRI documentIRI = null;
    +        TripleHandler rootHandler ;
    +    }
    +
    +    public BufferedTripleHandler(TripleHandler underlying) {
    +        this.underlying = underlying;
    +
    +        // hide model in the thread
    +        WorkflowContext wc = new WorkflowContext(underlying);
    +        BufferedTripleHandler.workflowContext.set(wc);
    +    }
    +
    +    private static final ThreadLocal<WorkflowContext> workflowContext = 
new ThreadLocal<>();
    +
    +    /**
    +     * Returns model which contains all other models.
    +     * @return
    +     */
    +    public static Model getModel() {
    +        return 
BufferedTripleHandler.workflowContext.get().modelMap.values().stream()
    +                .map(ch -> ch.extractionModel)
    +                .reduce(new LinkedHashModelFactory().createEmptyModel(), 
(mf, exm) -> {
    +                    mf.addAll(exm);
    +                    return mf;
    +                });
    +    }
    +
    +    @Override
    +    public void startDocument(IRI documentIRI) throws 
TripleHandlerException {
    +        BufferedTripleHandler.workflowContext.get().documentIRI = 
documentIRI;
    +    }
    +
    +    @Override
    +    public void openContext(ExtractionContext context) throws 
TripleHandlerException {
    +        //
    +    }
    +
    +    @Override
    +    public void receiveTriple(Resource s, IRI p, Value o, IRI g, 
ExtractionContext context) throws TripleHandlerException {
    +        getModelForContext(context).add(s,p,o,g);
    +    }
    +
    +    @Override
    +    public void receiveNamespace(String prefix, String uri, 
ExtractionContext context) throws TripleHandlerException {
    +        getModelForContext(context).setNamespace(prefix, uri);
    +    }
    +
    +    @Override
    +    public void closeContext(ExtractionContext context) throws 
TripleHandlerException {
    +        //
    +    }
    +
    +    @Override
    +    public void endDocument(IRI documentIRI) throws TripleHandlerException 
{
    +        BufferedTripleHandler.isDocumentFinish = true;
    +    }
    +
    +    @Override
    +    public void setContentLength(long contentLength) {
    +        underlying.setContentLength(contentLength);
    +    }
    +
    +    @Override
    +    public void close() throws TripleHandlerException {
    +        underlying.close();
    +    }
    +
    +    /**
    +     * Releases content of the model into underlying writer.
    +     */
    +    public static void releaseModel() throws TripleHandlerException {
    +        if(!BufferedTripleHandler.isDocumentFinish) {
    +            throw new RuntimeException("Before releasing document should 
be finished.");
    +        }
    +
    +        WorkflowContext workflowContext = 
BufferedTripleHandler.workflowContext.get();
    +
    +        String lastExtractor = ((Stack<String>) 
workflowContext.extractors).peek();
    --- End diff --
    
    @jgrzebyta IMHO, it would be vastly more straightforward to simply have the 
user extend the 
[`CompositeTripleHandler`](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/writer/CompositeTripleHandler.java)
 class to filter and transform triples into a domain-specific rdf graph of 
their choosing, before delegating the final domain-specific triple outputs to 
the wrapped `TripleHandler` instance(s) by calling `super.receiveTriple( 
[modified triple] )`.
    
    (Analogous in concept to Java's own 
[`FilterOutputStream`](https://docs.oracle.com/javase/8/docs/api/java/io/FilterOutputStream.html)
 class.)



> Add ability to run extractors in flow
> -------------------------------------
>
>                 Key: ANY23-396
>                 URL: https://issues.apache.org/jira/browse/ANY23-396
>             Project: Apache Any23
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 2.2
>            Reporter: Jacek Grzebyta
>            Assignee: Jacek Grzebyta
>            Priority: Minor
>
> Currently extractors do not work in flows. I.E. Next extractor has no any 
> access to triples made by previous one.
> It would be useful if an extractor has possibility to modify triples created 
> by another extractor.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ANY23-396) Add ability to run extractors in flow

Reply via email to