Hi, i've just started to use stanbol about a week ago and i must say it's a great tool! kudos to all the developers!
i'm now trying to import and index the latest freebase data set and one thing came into my mind that maybe it would be great to add other indexer engine interfaces to stanbol, that can handle large corpora like http://terrier.org/ as terrier is mapreduce based (i.e. hadoop) it'd be great to have a mapred based RDF storage and this way we could easily calculate for example real PageRank values on the freebase data set by using mahout's pagerank implementation. anybody maybe knows a good mapred based RDF storage? i've seen some people talking about HBase... of course this would require some work both in terrier and mahout, but then again for data sets like freebase this would make a lot of things faster/easier (if one has the cluster for it). happy to see comments on this! cheers, viktor