[
https://issues.apache.org/jira/browse/STANBOL-1125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14030790#comment-14030790
]
A. Soroka commented on STANBOL-1125:
------------------------------------
I'm starting work on a quick tool for this purpose, using the "sorted
N-Triples" assumption and assuming a SolrYard destination. See here for
development:
https://github.com/ajs6f/streamingindexer
I'd like to eventually improve this and contribute it to Stanbol, but that will
require much more discussion for me to develop an understanding of the
structure of the Solr indexes behind a SolrYard.
> Create a lightweight EntityHub Indexing Tool for Freebase
> ---------------------------------------------------------
>
> Key: STANBOL-1125
> URL: https://issues.apache.org/jira/browse/STANBOL-1125
> Project: Stanbol
> Issue Type: Improvement
> Components: Entityhub
> Reporter: Rafa Haro
>
> Due to the enormous size of the dumps, current Freebase indexing tool in
> Stanbol can't barely work in machines without several gigas of RAM and/or SSD
> disks. JenaTDB importer has been identified as the bootle neck of the
> indexing process. To use an RDF database is mandatory in order to, for
> instance, use LDPath programs at indexing time.
> The idea is to develop a lightweight indexing tool that stream data from the
> dumps and push it directly to Solr. Despite losing some functionality, it is
> possible for any user to generate Freebase EntityHub indexes from any dump.
--
This message was sent by Atlassian JIRA
(v6.2#6252)