Hello there,

I have been struggling with building indexes from generic rdf and even
using default configuration for more popular sources like dbpedia.

I found an indexing tool online configured to index yago, at
https://github.com/ChalithaUdara/Stanbol-Yago-Site.

Everything seemed to be going well until it got into this loop:

11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
11:17:26,546 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'condition' valid , namespace '
http://www.kinjal.com/condition:' invalid -> mapping ignored!
11:17:26,576 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
http://rdfex.org/withImports?uri=' invalid -> mapping ignored!
12:17:26,856 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'nsogi' valid , namespace '
http://prefix.cc/nsogi:' invalid -> mapping ignored!
12:17:26,918 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'dbc' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'category' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
12:17:26,949 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'hgnc' valid , namespace '
http://bio2rdf.org/hgnc:' invalid -> mapping ignored!
12:17:26,950 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'chebi' valid , namespace '
http://bio2rdf.org/chebi:' invalid -> mapping ignored!
12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'dbt' valid , namespace '
http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'pubmed' valid , namespace '
http://bio2rdf.org/pubmed_vocabulary:' invalid -> mapping ignored!
12:17:26,980 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'dbptmpl' valid , namespace '
http://dbpedia.org/resource/Template:' invalid -> mapping ignored!
12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'dbrc' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
12:17:26,981 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'call' valid , namespace '
http://webofcode.org/wfn/call:' invalid -> mapping ignored!
12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'dbcat' valid , namespace '
http://dbpedia.org/resource/Category:' invalid -> mapping ignored!
12:17:27,011 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'bgcat' valid , namespace '
http://bg.dbpedia.org/resource/?????????:' invalid -> mapping ignored!
12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'affymetrix' valid , namespace '
http://bio2rdf.org/affymetrix_vocabulary:' invalid -> mapping ignored!
12:17:27,012 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'condition' valid , namespace '
http://www.kinjal.com/condition:' invalid -> mapping ignored!
12:17:27,042 [pool-1-thread-1] WARN  impl.NamespacePrefixProviderImpl -
Invalid Namespace Mapping: prefix 'wimpo' valid , namespace '
http://rdfex.org/withImports?uri=' invalid -> mapping ignored!

It happened to me before with the dbpedia index and at first I thought it
was some problem with the rdf source, and since theses messages are logged
at WARN level, I simply ignored them. but after days, the indexing/tdb
directory stayed the same size even though there are still files in the
indexing/resources/rdfdata directory. Then I realised that these messages
follow a pattern and they are logged every hour with precision to the
second, which seems weird. Also, they are always the same messages. This
led me to think that the indexing tool is stuck in a loop and that's why it
is not moving any further. I think it is important to say that the one hour
time span between messages is the same for the dbpedia index and for the
yago index, the yago index is much bigger.

I have been constantly running `watch du * -s` in the resources directory
for days to check for size changes and nothing is changing and hasn't
changed for days.

I don't know if this is some problem with the configuration, but since I
didn't configure it myself, I assumed that what I got from github would be
a working configuration for this specific index.

I have a few questions related to this problem:

1) Is it safe to cancel the indexing tool and start again without changing
what's in the rdfdata and imported directories? Could this help at all?

2) What can possibly be causing this problem?

3) Why is it looping and logging every hour (accurate to the second)?

If there is any extra information I can provide that would help
understanding what the problem is here, tell me what it is and I will
provide it.

Regards,
Antero

Reply via email to