Looks similar to something others have seen: https://issues.apache.org/jira/browse/STANBOL-1446
which doesn't help you much, but might be a place to centralize the answer to this question. I wouldn't think that a WARN level message would tag a condition so severe that indexing doesn't take place. Perhaps it is something else. Can you use Jena's command-line tools to check and see how many entities have actually been loaded into TDB vs. how many you expect? That might give you a clue as to where indexing is hanging up (if it actually is). --- A. Soroka The University of Virginia Library > On Apr 5, 2016, at 7:59 AM, Antero Duarte <a.fduar...@gmail.com> wrote: > > Hello there, > > I have been struggling with building indexes from generic rdf and even > using default configuration for more popular sources like dbpedia. > > I found an indexing tool online configured to index yago, at > https://github.com/ChalithaUdara/Stanbol-Yago-Site. > > Everything seemed to be going well until it got into this loop: > > 11:17:26,546 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace ' > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! > 11:17:26,546 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'condition' valid , namespace ' > http://www.kinjal.com/condition:' invalid -> mapping ignored! > 11:17:26,576 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'wimpo' valid , namespace ' > http://rdfex.org/withImports?uri=' invalid -> mapping ignored! > 12:17:26,856 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'nsogi' valid , namespace ' > http://prefix.cc/nsogi:' invalid -> mapping ignored! > 12:17:26,918 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'dbc' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 12:17:26,949 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'category' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 12:17:26,949 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'hgnc' valid , namespace ' > http://bio2rdf.org/hgnc:' invalid -> mapping ignored! > 12:17:26,950 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'chebi' valid , namespace ' > http://bio2rdf.org/chebi:' invalid -> mapping ignored! > 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'dbt' valid , namespace ' > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! > 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'pubmed' valid , namespace ' > http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored! > 12:17:26,980 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace ' > http://dbpedia.org/resource/Template:' invalid -> mapping ignored! > 12:17:26,981 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'dbrc' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 12:17:26,981 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'call' valid , namespace ' > http://webofcode.org/wfn/call:' invalid -> mapping ignored! > 12:17:27,011 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'dbcat' valid , namespace ' > http://dbpedia.org/resource/Category:' invalid -> mapping ignored! > 12:17:27,011 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'bgcat' valid , namespace ' > http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored! > 12:17:27,012 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace ' > http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored! > 12:17:27,012 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'condition' valid , namespace ' > http://www.kinjal.com/condition:' invalid -> mapping ignored! > 12:17:27,042 [pool-1-thread-1] WARN impl.NamespacePrefixProviderImpl - > Invalid Namespace Mapping: prefix 'wimpo' valid , namespace ' > http://rdfex.org/withImports?uri=' invalid -> mapping ignored! > > It happened to me before with the dbpedia index and at first I thought it > was some problem with the rdf source, and since theses messages are logged > at WARN level, I simply ignored them. but after days, the indexing/tdb > directory stayed the same size even though there are still files in the > indexing/resources/rdfdata directory. Then I realised that these messages > follow a pattern and they are logged every hour with precision to the > second, which seems weird. Also, they are always the same messages. This > led me to think that the indexing tool is stuck in a loop and that's why it > is not moving any further. I think it is important to say that the one hour > time span between messages is the same for the dbpedia index and for the > yago index, the yago index is much bigger. > > I have been constantly running `watch du * -s` in the resources directory > for days to check for size changes and nothing is changing and hasn't > changed for days. > > I don't know if this is some problem with the configuration, but since I > didn't configure it myself, I assumed that what I got from github would be > a working configuration for this specific index. > > I have a few questions related to this problem: > > 1) Is it safe to cancel the indexing tool and start again without changing > what's in the rdfdata and imported directories? Could this help at all? > > 2) What can possibly be causing this problem? > > 3) Why is it looping and logging every hour (accurate to the second)? > > If there is any extra information I can provide that would help > understanding what the problem is here, tell me what it is and I will > provide it. > > Regards, > Antero