[ 
https://issues.apache.org/jira/browse/STANBOL-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rupert Westenthaler resolved STANBOL-1016.
------------------------------------------

    Resolution: Fixed

Implemented with http://svn.apache.org/r1467865
                
> Add RDF Triple Filter support to the Jena TDB Indexing Source
> -------------------------------------------------------------
>
>                 Key: STANBOL-1016
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1016
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Entityhub
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> The freebase.com dump has ~1.200.000.000 triples. Loading those triples to 
> Jena TDB takes ages if the RAM (available to the memory mapped files) is not 
> huge enough to hold the data. If the number of imported triples exceeds the 
> available RAM the import speed deceases to ~7k triples/sec on an SSD.  For 
> reaching those 7k triple/sec the logs show 1,5k reads and 1k writes per 
> second so import speeds on normal hard discs should be much slower.
> As most of the Triples contained in the freebase dump are not relevant for 
> indexing this issue will introduce a new feature to the Jena TDB Indexing 
> Source that allows - on a very low level - to filter out triples.
> This Filter will be based on Triples provided by the Riot parser and define a 
> single method
>     accept(Node subject, Node predicate, Node object) : boolean
> In addition the interface will extend IndexingComponent, what will allow to 
> configure it via the configuration file of the 
>     org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource
> The parameter used to configure the filter will be called "import-filter" and 
> the value MUST BE the Class name of the used implementation.
> The configuration of the jenatdb.RdfIndexingSource will be parsed to the 
> Import Filters #setConfiguration(..) method. This means that users will need 
> to add configuration properties of for the Import Filter to the configuration 
> of the RdfIndexingSource.
> To keep things simple the RdfImportFilter interface will be specific to the 
> Jena TDB Indexing Source.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to