GSoC 2014 Stanbol Enhancements Workflow Feature

Antonio David Perez Morales Thu, 13 Mar 2014 05:51:13 -0700

Hi Stanbolers

I would like to participate this year in the Google Summer of Code again
and I'm interested on integrating Workflows in Stanbol as an
alternative/improvement to the current Enhancement Chains.


Stanbol already have EnhancementChains but Enhancement-Chains are something
that helps to configure the Stanbol Enhancer (semantic lifting layer) for
different semantic lifting workflows. With Enhancement-workflows, the
semantic lifting layer can be integrated with the business layer, easing
the integration of semantic lifting with the enterprise.

This is a very powerful feature, because the capabilities of Stanbol can be
merged into real business use cases in a seamless way.

I have been taking a look at the open issue STANBOL-1008 [1] and some
efforts have already been done in this direction. At [2] we can find a
first integration between Apache Camel (framework implementing Enterprise
Integration Patterns) and Stanbol. This first integration contains the next
features:

- Launcher -> Stanbol Launcher with the needed bundles (containing Camel
bundle integration for Stanbol)

- Flow -> Contains the real Camel-Stanbol integration
    * WeightedGraphFlow : WeightedChain based on Camel (RouteBuilder). It
obtains all the EnhancementEngines (from a custom EngineComponent
implementing Camel interface) and generates a route from "default" route to
each engine (engine://className) sequentially

    * FilePoolGraphFlow : Create route (RouteBuilder) using files in
/tmp/chaininput directory , writing the enhancement result in
/tmp/chainoutput, using tika and some other engines

    * Web: all related classes and packages for web side to show the
configured camel engines, routes and be able to call routes in a stateless
way (ContentItem read/writers implementation for Jersey). A flow graph can
be called using REST endpoints

    * ServicesAPI: FlowJobManager interface. It accepts requests for
enhancing ContentItem's

    * CamelJobManager: FlowJobManager implementation. Registers "engine"
context in Camel to load EnhancementEngines by name and call the producer
template for the "default". Contains EngineComponent, EngineConsumer,
EngineEndpoint and EngineProducer custom implementations.

This first integration is good but IMHO it could be improved. Some
improvements I have in mind are:

- Improve the current code, which supports "engine" context to call engines
with new ones to create a strong Workflow framework in Stanbol. Quoting
Florent comment in code:

/**
  * TODO : That's actually an hack for use the jobmanager.enhanceContent()
interface.
  * That's not really a chain that we want to pass but an flowgraph...
  * May we have to differ from the jobmanager impl... but not too much
could be cool...
  * At the end the flow graph can be able to call engines://, chain:// and
store://
  */

- Supporting route definitions in XML using the Camel DSL and be able to
deploy a new route (workflow or chain) only uploading or placing a new file
in some Stanbol directory (like Indexing tool does with Referenced Sites)

- Creates an API with Camel-based components, for example for REST engines
used currently in Stanbol (like StandfordNLP). This way we can leverage the
way Camel calls the endpoints (using URIs with parameters) to use the same
component with dynamic configuration according to the passed parameters.

- Integrate a visual tool (like [3]) to allow users generating custom
routes in an easy way.

I would like to know if these tasks can fit the scope of a gsoc project. I
know there are many things to do but it could be good to start integrating
a stable version of a Workflow framework in Stanbol and then going
improving it iteratively with new and cool features.

What do you think?

Feedback and comments are more than welcome in order to help writing the
GSoC proposal.

Thanks. Regards


[1] https://issues.apache.org/jira/browse/STANBOL-1008
[2] https://svn.apache.org/repos/asf/stanbol/branches/cameltrial/
[3] http://www.jboss.org/products/fuse

-- 

------------------------------
This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy 
by an authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, 
London W6 7AN.

GSoC 2014 Stanbol Enhancements Workflow Feature

Reply via email to