Hi IMO if one wants to scale Stanbol to multiple machines (or better call it processing nodes) one would need to:
* hold the state - ContentItem - in a central place (e.g. MongoDB) * start multiple Stanbol instances providing only a few EnhancementEngines and or sub-chains - in cases where you want to ensure that some engines run on the same host * those instances will need to register themselves with a central registry. probably the same db use for ContentItems * provide an EnhancementJobManager that executes registers ContentItems and distributes calls to EnhancementEngines (and sub-chains) to different hosts. This would also require that * all parts of the ContentItems are serializeable. I think this is already the case, but I would need to check. * EnhancementEngines would need to provide some more metadata so that - in most cases - the EnhancementJobManager can already determine if an Engine can enhance a ContentItem (EnhancementEngine#canEnhance(..) method) We should definitely keep such use cases in mind if we introduce the Enhancement Task RESTful service and the EnhancementJob API (see [1] and following mails), because EnhancementJob would be most likely the element representing the state in the central DB and the Task API could be the one used to call EnhancementEngines on multiple processing nodes. best Rupert [1] http://markmail.org/message/zqztwjhndwj74jqv On Wed, Mar 6, 2013 at 7:16 PM, Som Satpathy <[email protected]> wrote: > Hi Fabian, > > For now the data I've been working is plain text, so it is easy to > visualize how splitting the content and distributing the enhancement > process can help us. But you are right it is a challenge once context comes > in. As you said, then the splitting of the content has to be designed > carefully. > > Thanks, > Som > > On Wed, Mar 6, 2013 at 12:08 AM, Fabian Christ <[email protected] >> wrote: > >> 2013/3/5 Som Satpathy <[email protected]>: >> > I am aiming for >> > distributing the enhancement request for the posted content over a >> cluster >> > of nodes. >> >> And should each node process a single enhancement engine or are you >> trying to split the content? >> >> Splitting the content has to be designed carefully since some >> enhancement engines may rely on context information. Therefore, it >> depends on the engines' requirements and the kind of content you are >> processing to decide how the content could be split into parts. >> >> For example, you may have only engines that work on sentence level and >> the content is plain text. This way you could distribute each >> sentence. But if your content is a website and your engines take the >> structure of the website into account, it may not be possible to split >> at sentence level. >> >> Best, >> - Fabian >> >> -- >> Fabian >> http://twitter.com/fctwitt >> -- | Rupert Westenthaler [email protected] | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen
