We can look into https://github.com/DigitalPebble/behemoth for integration with Stanbol. It already provides the basic architecture for running document pipelines using MR.
On Tue, Mar 5, 2013 at 2:27 PM, Bertrand Delacretaz <[email protected]>wrote: > Hi, > > On Mon, Mar 4, 2013 at 6:57 PM, Som Satpathy <[email protected]> > wrote: > > ...I have been working on implementing a map-reduce job to run Stanbol > > enhancement chains over hadoop. Is there work currently going on to > address > > the scalability aspect?... > > Note that you could scale Stanbol as is using http load balancing to > address multiple Stanbol back-end instances which all have the same > config, data files etc. > > As the content enhancer is stateless, this should be relatively simple > to implement, though we might need to provide some replication/sync > facilities for those configs and data files. > > Are you aiming for map-reducing a single enhancement request, by > breaking up the submitted content in small parts and enhancing them > independently? > > -Bertrand >
