Looks similar to something others have seen:

https://issues.apache.org/jira/browse/STANBOL-1446

which doesn't help you much, but might be a place to centralize the answer to 
this question. I wouldn't think that a WARN level message would tag a condition 
so severe that indexing doesn't take place. Perhaps it is something else.

Can you use Jena's command-line tools to check and see how many entities have 
actually been loaded into TDB vs. how many you expect? That might give you a 
clue as to where indexing is hanging up (if it actually is).

---
A. Soroka
The University of Virginia Library

> On Apr 5, 2016, at 7:59 AM, Antero Duarte <a.fduar...@gmail.com> wrote:
> 
> Hello there,
> 
> I have been struggling with building indexes from generic rdf and even
> using default configuration for more popular sources like dbpedia.
> 
> I found an indexing tool online configured to index yago, at
> https://github.com/ChalithaUdara/Stanbol-Yago-Site.
> 
> Everything seemed to be going well until it got into this loop:
> 
> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
> 11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> 11:17:26,576 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> 12:17:26,856 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'nsogi' valid , namespace '
> http://prefix.cc/nsogi:' invalid -> mapping ignored!
> 12:17:26,918 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'dbc' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'category' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'hgnc' valid , namespace '
> http://bio2rdf.org/hgnc:' invalid -> mapping ignored!
> 12:17:26,950 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'chebi' valid , namespace '
> http://bio2rdf.org/chebi:' invalid -> mapping ignored!
> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'dbt' valid , namespace '
> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'pubmed' valid , namespace '
> http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
> 12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace '
> http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'dbrc' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'call' valid , namespace '
> http://webofcode.org/wfn/call:' invalid -> mapping ignored!
> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'dbcat' valid , namespace '
> http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
> 12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'bgcat' valid , namespace '
> http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored!
> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
> http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
> 12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'condition' valid , namespace '
> http://www.kinjal.com/condition:' invalid -> mapping ignored!
> 12:17:27,042 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
> Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
> http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
> 
> It happened to me before with the dbpedia index and at first I thought it
> was some problem with the rdf source, and since theses messages are logged
> at WARN level, I simply ignored them. but after days, the indexing/tdb
> directory stayed the same size even though there are still files in the
> indexing/resources/rdfdata directory. Then I realised that these messages
> follow a pattern and they are logged every hour with precision to the
> second, which seems weird. Also, they are always the same messages. This
> led me to think that the indexing tool is stuck in a loop and that's why it
> is not moving any further. I think it is important to say that the one hour
> time span between messages is the same for the dbpedia index and for the
> yago index, the yago index is much bigger.
> 
> I have been constantly running `watch du * -s` in the resources directory
> for days to check for size changes and nothing is changing and hasn't
> changed for days.
> 
> I don't know if this is some problem with the configuration, but since I
> didn't configure it myself, I assumed that what I got from github would be
> a working configuration for this specific index.
> 
> I have a few questions related to this problem:
> 
> 1) Is it safe to cancel the indexing tool and start again without changing
> what's in the rdfdata and imported directories? Could this help at all?
> 
> 2) What can possibly be causing this problem?
> 
> 3) Why is it looping and logging every hour (accurate to the second)?
> 
> If there is any extra information I can provide that would help
> understanding what the problem is here, tell me what it is and I will
> provide it.
> 
> Regards,
> Antero

Reply via email to