[jira] [Commented] (STANBOL-1125) Create a lightweight EntityHub Indexing Tool for Freebase

A. Soroka (JIRA) Thu, 05 Jun 2014 07:19:31 -0700

    [ 
https://issues.apache.org/jira/browse/STANBOL-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14018817#comment-14018817
 ]


A. Soroka commented on STANBOL-1125:
------------------------------------

I'm not sure that the constraint of sorted-by-subject is too much to ask, 
especially because with a format like N-Triples, it can be accomplished with 
simple tools like POSIX "sort", but maybe that's just my bias. I do like to use 
simple tools early in a processing chain. In any event, while a Solr-specific 
indexer would doubtless be very useful (and I would gladly use it!) I would 
ideally like to be able to use a Clerezza Yard as well. Perhaps different 
strategies are appropriate for streaming into different indexing destinations… 
Is there any policy on the indexing tool for this question? In other words, 
does Stanbol expect to support all Yard implementations as indexing 
destinations for all indexing tools, or just for the basic tool, with 
"special-purpose" tools supporting various Yard impls as feasible?

> Create a lightweight EntityHub Indexing Tool for Freebase
> ---------------------------------------------------------
>
>                 Key: STANBOL-1125
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1125
>             Project: Stanbol
>          Issue Type: Improvement
>          Components: Entityhub
>            Reporter: Rafa Haro
>
> Due to the enormous size of the dumps, current Freebase indexing tool in 
> Stanbol can't barely work in machines without several gigas of RAM and/or SSD 
> disks. JenaTDB importer has been identified as the bootle neck of the 
> indexing process. To use an RDF database is mandatory in order to, for 
> instance, use LDPath programs at indexing time.
> The idea is to develop a lightweight indexing tool that stream data from the 
> dumps and push it directly to Solr. Despite losing some functionality, it is 
> possible for any user to generate Freebase EntityHub indexes from any dump.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (STANBOL-1125) Create a lightweight EntityHub Indexing Tool for Freebase

Reply via email to