Hi Amindri Based on the code the NPE could originate from a namespace prefix unknown to the namespace prefix service.
Can you please check the data of the "incoming_links.txt" file against mappings define in the "indexing/config/namespaceprefix.mappings" file. My guess is that the "incoming_links.txt" uses a prefix that is not define in the mappings file. It is recommended to explicitly define namespace prefix mappings for all namespaces used by the indexing process (config data and rdf data). For missing mappings http://prefix.cc/ is used as a fallback. best Rupert On Mon, Feb 9, 2015 at 7:33 AM, Amindri Udugala <amindriudug...@gmail.com> wrote: > Hi All, > > I need to create an index from a Freebase data dump. So I followed the > instructions in the README file in entityhub\indexing\freebase. > > First I executed java -jar > org.apache.stanbol.entityhub.indexing.freebase-1.0.0-SNAPSHOT.jar init, to > generate the folder structure. The folder structure was successfully > generated except for the following warnings > > 16:16:20,530 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'nsogi' valid , namespace 'http://prefix.cc/nsogi:' > invalid -> mapping ignored! > 16:16:21,279 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'category' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 16:16:21,435 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'chebi' valid , namespace ' > http://bio2rdf.org/chebi:' invalid -> mapping ignored! > 16:16:21,435 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'hgnc' valid , namespace 'http://bio2rdf.org/hgnc:' > invalid -> mapping ignored! > 16:16:21,450 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'dbptmpl' valid , namespace ' > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'pubmed' valid , namespace ' > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored! > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'dbc' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'dbt' valid , namespace ' > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! > 16:16:21,638 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'dbrc' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 16:16:21,809 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'call' valid , namespace ' > http://webofcode.org/wfn/call:' invalid -> mapping ignored! > 16:16:21,809 [main] WARN impl.NamespacePrefixProviderImpl - Invalid > Namespace Mapping: prefix 'affymetrix' valid , namespace ' > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! > > Then I copied the Freebase dump (freebase-rdf-latest.gz) to the > indexing/resources/rdfdata folder > and the incoming_links.txt file, generated by fbrankings-uri.sh to > indexing/resources folder and executed the indexing process. (I used all > the default config files) > > While executing the index process I noticed the following log. > > > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityDataIterable: null > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityIterator: > org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator@1880249c > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityDataProvider: > org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource@4e38a55 > 16:38:40,806 [main] INFO core.IndexerFactory - - EntityScoreProvider: null > > Finally it threw a null pointer exception as follows > > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - ... 1 files imported > in 0 seconds > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - Loding 0 File ... > 16:38:40,837 [Thread-3] INFO source.ResourceLoader - ... 0 files imported > in 0 seconds > 16:38:42,912 [Thread-0] INFO solryard.SolrYardIndexingDestination - ... > create SolrYard > 16:38:42,959 [main] INFO impl.IndexerImpl - ... delete existing > IndexedEntityId file > C:\cygwin64\home\User\code\stanbol_indexing\indexing\destination\indexed-entities-ids.zip > 16:38:42,974 [main] INFO impl.IndexerImpl - Initialisation completed > 16:38:42,974 [main] INFO impl.IndexerImpl - ... initialisation completed > 16:38:42,974 [main] INFO impl.IndexerImpl - start indexing ... > 16:38:42,974 [main] INFO impl.IndexerImpl - Indexing started ... > Exception in thread "Indexing: Entity Source Reader Deamon" > java.lang.NullPointerException > at java.lang.StringBuilder.<init>(Unknown Source) > at > org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.parseEntityFormLine(LineBasedEntityIterator.java:435) > at > org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.getNext(LineBasedEntityIterator.java:379) > at > org.apache.stanbol.entityhub.indexing.core.source.LineBasedEntityIterator.hasNext(LineBasedEntityIterator.java:356) > at > org.apache.stanbol.entityhub.indexing.core.impl.EntityIdBasedIndexingDaemon.run(EntityIdBasedIndexingDaemon.java:55) > at java.lang.Thread.run(Unknown Source) > > I'm not sure if this happens because I haven't configured an important > property in a configuration file. I'm pretty new to Stanbol and any help > would be much appreciated. > > Thanks in advance. > -- > Regards > Amindri Udugala -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen | REDLINK.CO .......................................................................... | http://redlink.co/