Re: Queries regarding Stanbol scalabity

Rupert Westenthaler Thu, 07 Mar 2013 05:48:01 -0800

Hi

IMO if one wants to scale Stanbol to multiple machines (or better call
it processing nodes) one would need to:


* hold the state - ContentItem - in a central place (e.g. MongoDB)
* start multiple Stanbol instances providing only a few
EnhancementEngines and or sub-chains - in cases where you want to
ensure that some engines run on the same host
* those instances will need to register themselves with a central
registry. probably the same db use for ContentItems
* provide an EnhancementJobManager that executes registers
ContentItems and distributes calls to EnhancementEngines (and
sub-chains) to different hosts.

This would also require that

* all parts of the ContentItems are serializeable. I think this is
already the case, but I would need to check.
* EnhancementEngines would need to provide some more metadata so that
- in most cases - the EnhancementJobManager can already determine if
an Engine can enhance a ContentItem (EnhancementEngine#canEnhance(..)
method)

We should definitely keep such use cases in mind if we introduce the
Enhancement Task RESTful service and the EnhancementJob API (see [1]
and following mails), because EnhancementJob would be most likely the
element representing the state in the central DB and the Task API
could be the one used to call EnhancementEngines on multiple
processing nodes.

best
Rupert

[1] http://markmail.org/message/zqztwjhndwj74jqv

On Wed, Mar 6, 2013 at 7:16 PM, Som Satpathy <[email protected]> wrote:
> Hi Fabian,
>
> For now the data I've been working is plain text, so it is easy to
> visualize how splitting the content and distributing the enhancement
> process can help us. But you are right it is a challenge once context comes
> in. As you said, then the splitting of the content has to be designed
> carefully.
>
> Thanks,
> Som
>
> On Wed, Mar 6, 2013 at 12:08 AM, Fabian Christ <[email protected]
>> wrote:
>
>> 2013/3/5 Som Satpathy <[email protected]>:
>> > I am aiming for
>> > distributing the enhancement request for the posted content over a
>> cluster
>> > of nodes.
>>
>> And should each node process a single enhancement engine or are you
>> trying to split the content?
>>
>> Splitting the content has to be designed carefully since some
>> enhancement engines may rely on context information. Therefore, it
>> depends on the engines' requirements and the kind of content you are
>> processing to decide how the content could be split into parts.
>>
>> For example, you may have only engines that work on sentence level and
>> the content is plain text. This way you could distribute each
>> sentence. But if your content is a website and your engines take the
>> structure of the website into account, it may not be possible to split
>> at sentence level.
>>
>> Best,
>>  - Fabian
>>
>> --
>> Fabian
>> http://twitter.com/fctwitt
>>



--
| Rupert Westenthaler             [email protected]
| Bodenlehenstraße 11                             ++43-699-11108907
| A-5500 Bischofshofen

Re: Queries regarding Stanbol scalabity

Reply via email to