[Dbp-spotlight-users] Issues while generating the data for Spotlight

Pajolma Rupi Wed, 13 May 2015 05:49:26 -0700

Dear all, 

I am a R&D engineer, working at a French national institute (INRIA) for 
Semanticpedia project ( http://www.semanticpedia.org/ ). 
The idea of the project is to provide the French version of DBpedia and related 
services (among them DBpedia Spotlight in French). For the moment I am focusing 
on DBpedia Spotlight, trying to configure a French instance of it.


While exploring the process of building a Spotlight service with my own data, I 
came across two following issues I think it might be useful to share with you: 

1- In the internationalization page 
(https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Internationalization-(Lucene-backed-core))
 it is given an example of the configuration for Spanish and a link to the 
index.properties fi le ( 
https://dl.dropboxusercontent.com/u/99877231/dbpedia/conf/indexing.es.properties
 ) with the configurations needed for this language. The names of data files 
here do not consider the compressed version, so for example: 

[a] org.dbpedia.spotlight.data.wikipediaDump = 
/usr/local/spotlight/dbpedia_data/es/eswiki-latest-pages-articles.xml 
But when running the index.sh provided with the service, it seems like there is 
a problem with this data format, and it expects it to be in the compressed 
format(bz2). 
My suggestion would be to update the guide of this page by either changing the 
index script delivered with the service so that it accepts the compressed 
format too, or change the content of the example taken for Spanish ( 
https://dl.dropboxusercontent.com/u/99877231/dbpedia/conf/indexing.es.properties
 ) by having for example the following line [b] instead of [a] (applied this 
for all the data files declared in the indexing file): 

[b] org.dbpedia.spotlight.data.wikipediaDump = 
/usr/local/spotlight/dbpedia_data/es/eswiki-latest-pages-articles.xml.bz2 
I agree that this is not a major issue, but it can help developers saving some 
time. 

2- I encountered the problem described in this old post: 
https://github.com/dbpedia-spotlight/dbpedia-spotlight/issues/138 regarding the 
infinite loop generated by ExtractCandidateMap.scala. I would appreciate if 
somebody could advice me on how to solve it, given the fact that the solutions 
proposed on this post date a long time ago and I wouldn't be sure on their 
coherence. 

Hope my feedback can be useful to you and hope to have an answer regarding the 
second issue I'm encountering. 

Best, 
Pajolma 


Pajolma RUPI 
Research and Development Engineer 
Service de l'e-Information Scientifique et Multimédia (SEISM) 
Research Centre INRIA Grenoble - Rhône-Alpes 
655 Avenue de l'Europe 
38330 Montbonnot-Saint-Martin 
France

------------------------------------------------------------------------------
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

[Dbp-spotlight-users] Issues while generating the data for Spotlight

Reply via email to