[
https://issues.apache.org/jira/browse/STANBOL-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rupert Westenthaler resolved STANBOL-1016.
------------------------------------------
Resolution: Fixed
Implemented with http://svn.apache.org/r1467865
> Add RDF Triple Filter support to the Jena TDB Indexing Source
> -------------------------------------------------------------
>
> Key: STANBOL-1016
> URL: https://issues.apache.org/jira/browse/STANBOL-1016
> Project: Stanbol
> Issue Type: Sub-task
> Components: Entityhub
> Reporter: Rupert Westenthaler
> Assignee: Rupert Westenthaler
>
> The freebase.com dump has ~1.200.000.000 triples. Loading those triples to
> Jena TDB takes ages if the RAM (available to the memory mapped files) is not
> huge enough to hold the data. If the number of imported triples exceeds the
> available RAM the import speed deceases to ~7k triples/sec on an SSD. For
> reaching those 7k triple/sec the logs show 1,5k reads and 1k writes per
> second so import speeds on normal hard discs should be much slower.
> As most of the Triples contained in the freebase dump are not relevant for
> indexing this issue will introduce a new feature to the Jena TDB Indexing
> Source that allows - on a very low level - to filter out triples.
> This Filter will be based on Triples provided by the Riot parser and define a
> single method
> accept(Node subject, Node predicate, Node object) : boolean
> In addition the interface will extend IndexingComponent, what will allow to
> configure it via the configuration file of the
> org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource
> The parameter used to configure the filter will be called "import-filter" and
> the value MUST BE the Class name of the used implementation.
> The configuration of the jenatdb.RdfIndexingSource will be parsed to the
> Import Filters #setConfiguration(..) method. This means that users will need
> to add configuration properties of for the Import Filter to the configuration
> of the RdfIndexingSource.
> To keep things simple the RdfImportFilter interface will be specific to the
> Jena TDB Indexing Source.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira