distributed indexing

Viktor Gal Wed, 20 Nov 2013 09:03:06 -0800

Hi,

i've just started to use stanbol about a week ago and i must say it's a great 
tool! kudos to all the developers!


i'm now trying to import and index the latest freebase data set and one thing 
came into my mind that maybe it would be great to add other indexer engine 
interfaces to stanbol, that can handle large corpora like http://terrier.org/

as terrier is mapreduce based (i.e. hadoop) it'd be great to have a mapred 
based RDF storage and this way we could easily calculate for example real 
PageRank values on the freebase data set by using mahout's pagerank 
implementation.

anybody maybe knows a good mapred based RDF storage? i've seen some people 
talking about HBase...

of course this would require some work both in terrier and mahout, but then 
again for data sets like freebase this would make a lot of things faster/easier 
(if one has the cluster for it).

happy to see comments on this!

cheers,
viktor

distributed indexing

Reply via email to