Rupert Westenthaler created STANBOL-1326:
--------------------------------------------

             Summary: Updates to the Stanbol Enhancer API for 1.0
                 Key: STANBOL-1326
                 URL: https://issues.apache.org/jira/browse/STANBOL-1326
             Project: Stanbol
          Issue Type: Epic
          Components: Enhancement Engines, Enhancer
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler
             Fix For: 1.0.0



h2. Enhancer API v1.0
=================

This describes changes and addition to the Stanbol Enhancer API with version 
1.0.

Main Features of the new API are

* Clear separation between 
    *# the content and analysis results
    *# metadata and state of the enhancement process
* Support for 
[EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488) 
(_Note_: light weight version is also supported started from `0.12.1` - see 
[STANBOL-1280](https://issues.apache.org/jira/browse/STANBOL-1280) for details) 
EnhancementProperties can be used for Enhancement Chain / ExecutionPlan 
specific parameters  as well as Request specific parameters. Typical use cases 
include: Parsing of credentials for remote services; the configuration of 
dereferenced fields, minimum confidence values, ...
* Low level support for [Enhancement 
Workflows](https://issues.apache.org/jira/browse/STANBOL-1008): The new API 
will allow to create `EnhancementJobs` directly based on RDF 
[ExecutionPlans](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
 in addition to Enhancement `Chains`. In addition The `EnhancementJobManager` 
will support partial executions of selected `ExecutionNodes` as well as 
resuming the enhancement after an change of the execution plan. This will allow 
enhancement workflows e.g. to (1) start with a simple language detection; (2) 
add additional `ExecutionNodes` based on the detected language and resume 
processing by parsing the `EnhancementJob` again the the `EnhancementJobManager`
* Low level support for distributed computation of EnhancementJobs: The API 
will allow to execute only selected `ExecutionNodes`of an 
[ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
 This will allow to have different Stanbol Worker with different 
configurations. `EnhancementJobManager` running on workers could than be 
instructed to only execute specific `ExecutionNodes`.

The following sections do provide an overview about API changes and additions.

h3. EnhancementJob
--------------

The `EnhancementJob`represents the process of the enhancement of an 
`ContentItem` by the Stanbol Enhancer. It is a new interface introduced with 
`1.0`. Before 1.0 this was an implementation specific class used by the 
[EventJobManager](http://stanbol.staging.apache.org/docs/trunk/components/enhancer/enhancementjobmanager#eventjobmanager).

{code:java}
    EnhancementJob
        + getJobId : NonLiteral
        + getLock() : ReadWriteLock
        + getExecutionMetadata() : MGraph
        + getContentItem() : ContentItem
{code}

The `EnhancementJob` provides access to both the `ContentItem` and processing 
information. Only parsers, Writers and the `EnhancementJobManager` are intended 
to have a reference to the `EnhancementJob`. `EnhancementEngines` will only get 
an reference to the `ContentItem`. Engines will also no longer be able to 
access the `MGraph` with the 
[ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata)
 nor the 
[ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
 Both can be obtained in 0.12.1 via the 
[ContentParts](http://stanbol.staging.apache.org/stanbol/docs/trunk/enhancer/contentitem.#contentparts)
 of the processed `ContentItem`.

The `jobId` of the EnhancementJob is used to reference the Job. It SHOULD be 
different as the URI of the ContentItem to avoid issues with multiple requests 
for the same ContentItem (as described by 
[STANBOL-830](https://issues.apache.org/jira/browse/STANBOL-830)

The EnhancementJob API does not distinguish between the 
[ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan)
 and the 
[ExecutionMetadata](http://stanbol.apache.org/docs/trunk/components/enhancer/executionmetadata).
 There is only a single getter for the ExecutionMetadata that need to provide 
access to both.

In most cases it will be sufficient to copy over the triples of the 
ExecutionPlan to the `MGraph` of the ExecutionMetadata before starting the 
enhancement. However in use cases where the ExecutionPlan might change (e.g. in 
between several partial executions) one can also use a setting where the 
ExecutionPlan is kept in a separate graph. In enforce this the Clerezza 
`UnionMGraph` implementation can be used. This implementation supports to 
create an union view over several TripleCollections while all modifications are 
done on the first one. So creating a `UnionMGraph`with the MGraph holding the 
ExectionMetadata at idx `0` and the the TripleCollection with the ExecutionPlan 
at idx `1` results in the desired setting.

h3. EnhancementJobManager
---------------------

The job manager interface is very simple. It only contains the method to 
process an EnhancementJob. Optionally an array of `ep:ExecutionNode` instances 
can be parsed.

{code:java}
    EnhancementJobManager
        + enhance(EnhancementJob job, NonLiteral...executions)
{code}

The parsed `EnhancementJob` is expected to have its ExecutionMetadata to be 
initialized. In contrast to earlier Stanbol version the  
`EnhancementJobManager` is no longer responsible to initialize those Metadata 
based on the parsed enhancement `Chain`. This is now in the responsibility of 
the `EnhancementJobBuilder`.

The new `EnhancementJobManager` will support _partial executions_. This means 
that the callers can request the JobManager to process only some of the 
`ep:ExecutionNode` defined by the 
[ExecutionPlan](https://stanbol.apache.org/docs/trunk/components/enhancer/chains/ExecutionPlan).
 If no executions are defined the `EnhancementJobManager` is expected to 
execute all execution nodes. 

If a array of `ep:ExecutionNode` instances is parsed the EnhancementJobManager 
must only consider to process those and ignore all others. If those executions 
do `ep:dependsOn` on another `ep:ExecutionNode` that is not included and not 
yet completed (not `ep:optional` and not yet processed) the job manager is 
expected to fail with a `ChainException`.

The `EnhancementJobManager` needs to consider existing `em:EngineExecutions` 
and their `em:status`. This is important correctly resume the processing of 
partially completed enhancement jobs.


h3. EnhancementJobBuilder
---------------------

The EnhancementJobBuilder allows to create EnhancementJobs. As building an 
EnhancementJob requires to select specific implementations of the 
`EnhancementJob` and `ContentItem` the `EnhancementJobBuilder` does not have a 
constructor, but an own `EnhancementJobFactory` is used. The 
`EnhancementJobFactory` is an OSGI service and can be looked up as those by 
components that need to build `EnhancementJob` instances.

{code:java}
    EnhancementJobFactory
        + create() : EnhancementJobBuilder

    EnhancementJobBuilder
        + contentSource(ContentSource) : EnhancementJobBuilder
        + id(String id)
        + cotentRef(ContentReference)
        + chain(Chain chain)
        + execPlan(TripleCollection ExecutionPlan)
        + **(..)
        + build() : EnhancementJob
{code}


Intended Usage:

{code:java}
    @Reference
    EnhancementJobFactory ejf;
    
    @Reference
    EnhancementJobManager ejm;
    
    ContentSource content; //the parsed content
    Chain chain; //the requested enhancement chain

    ejm.enhance(ejf.create()
        .source(content)
        .chain(chain)
        .build());
{code}


The `EnhancementJobBuilder` is obtained by using the 
EnhancementJobFactory#create() method. After creation the builder provides an 
API to set the parsed content, id as well as the enhancement chain. As an 
alternative the ExecutionPlan can also be set as RDF graph. After the 
configuration the `EnhancementJob` can be `#build()` and parsed to the 
`EnhancementJobManager`.

h3. ContentItem
-----------

There will be also minor API adaptions to the ContentItem API. The main reason 
for that is the removal of the `ContentItemFactory` combined with the 
requirement of some `EnhancementEngines` to create `Blob` instances. Because of 
that methods will be added to the ContentItem that allow add an `Blob` content 
part based on a `ContentSource` as well as a `ContentSink`

{code:java}
    ContentItem
        + addContent(UriRef id, ContentSource source) : Blob
        + addContentStream(UriRef id, String mediaType) : ContentStink
{code}

This methods will replace the `ContentItemFactory#createBlob(..)` and 
`ContentItemFactory#createContentSink(..)` methods. This means that 
EnhancementEngines that need to create `Blobs` need no longer care about 
obtaining a `ContentItemFactory` instance. The right `Blob` implementation to 
be used will already be wired when the `ContentItem` is created by the 
`EnhancementJobBuilder`.

_Notes:_ 

* the `ContentItem#addPart(..)` method can still be used to add `Blob` 
instances to the `ContentItem`. This might be useful for Engines that do 
provide their own `Blob` implementation.
* both `addContent*` methods will override any contentPart registered with the 
parsed id. Those methods do NOT return the previously registered part such as 
the `#addPart(..)` method. 

h3. EnhancementEngine
-----------------

The API of the `EnhancementEngine` interface will be adapted to parse the 
[EnhancementProperties](https://issues.apache.org/jira/browse/STANBOL-488) as 
additional parameter of the `#computeEnhancements(..)` method

{code:java}
    EnhancementEngine
        + getName() : String
        + canEnhance(ContentItem ci) : int
        + computeEnhancements(ContentItem ci, Map<String,Object> properties)
{code}

A new Map instance with a copy of the properties will be parsed to the engine. 
Therefore changes to the map will have no side effects.

For details about EnhancementProperties see 
[STANBOL-488](https://issues.apache.org/jira/browse/STANBOL-488.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to