Hi Stanbolers I would like to participate this year in the Google Summer of Code again and I'm interested on integrating Workflows in Stanbol as an alternative/improvement to the current Enhancement Chains.
Stanbol already have EnhancementChains but Enhancement-Chains are something that helps to configure the Stanbol Enhancer (semantic lifting layer) for different semantic lifting workflows. With Enhancement-workflows, the semantic lifting layer can be integrated with the business layer, easing the integration of semantic lifting with the enterprise. This is a very powerful feature, because the capabilities of Stanbol can be merged into real business use cases in a seamless way. I have been taking a look at the open issue STANBOL-1008 [1] and some efforts have already been done in this direction. At [2] we can find a first integration between Apache Camel (framework implementing Enterprise Integration Patterns) and Stanbol. This first integration contains the next features: - Launcher -> Stanbol Launcher with the needed bundles (containing Camel bundle integration for Stanbol) - Flow -> Contains the real Camel-Stanbol integration * WeightedGraphFlow : WeightedChain based on Camel (RouteBuilder). It obtains all the EnhancementEngines (from a custom EngineComponent implementing Camel interface) and generates a route from "default" route to each engine (engine://className) sequentially * FilePoolGraphFlow : Create route (RouteBuilder) using files in /tmp/chaininput directory , writing the enhancement result in /tmp/chainoutput, using tika and some other engines * Web: all related classes and packages for web side to show the configured camel engines, routes and be able to call routes in a stateless way (ContentItem read/writers implementation for Jersey). A flow graph can be called using REST endpoints * ServicesAPI: FlowJobManager interface. It accepts requests for enhancing ContentItem's * CamelJobManager: FlowJobManager implementation. Registers "engine" context in Camel to load EnhancementEngines by name and call the producer template for the "default". Contains EngineComponent, EngineConsumer, EngineEndpoint and EngineProducer custom implementations. This first integration is good but IMHO it could be improved. Some improvements I have in mind are: - Improve the current code, which supports "engine" context to call engines with new ones to create a strong Workflow framework in Stanbol. Quoting Florent comment in code: /** * TODO : That's actually an hack for use the jobmanager.enhanceContent() interface. * That's not really a chain that we want to pass but an flowgraph... * May we have to differ from the jobmanager impl... but not too much could be cool... * At the end the flow graph can be able to call engines://, chain:// and store:// */ - Supporting route definitions in XML using the Camel DSL and be able to deploy a new route (workflow or chain) only uploading or placing a new file in some Stanbol directory (like Indexing tool does with Referenced Sites) - Creates an API with Camel-based components, for example for REST engines used currently in Stanbol (like StandfordNLP). This way we can leverage the way Camel calls the endpoints (using URIs with parameters) to use the same component with dynamic configuration according to the passed parameters. - Integrate a visual tool (like [3]) to allow users generating custom routes in an easy way. I would like to know if these tasks can fit the scope of a gsoc project. I know there are many things to do but it could be good to start integrating a stable version of a Workflow framework in Stanbol and then going improving it iteratively with new and cool features. What do you think? Feedback and comments are more than welcome in order to help writing the GSoC proposal. Thanks. Regards [1] https://issues.apache.org/jira/browse/STANBOL-1008 [2] https://svn.apache.org/repos/asf/stanbol/branches/cameltrial/ [3] http://www.jboss.org/products/fuse -- ------------------------------ This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. Zaizi Ltd is registered in England and Wales with the registration number 6440931. The Registered Office is Brook House, 229 Shepherds Bush Road, London W6 7AN.