Hi Pablo,

Thanks for your response. Of course I don't mind to change it. Anyway, is it possible that the issue had been produced by having the "spanish" namespaces (http://es.dbpedia.org/resource/ and http://es.dbpedia.org/ontology/) instead of the default ones??

Thanks. Regards

El 29/11/12 12:04, Pablo N. Mendes escribió:

Hi Rafa,
It looks like NxParser (or our code based on it) is failing to parse the unicode characters in your URIs. We now have Any23 in our dependencies, and we thought about losing NxParser for good. But I am not sure they will handle unicode either, see:
http://code.google.com/p/any23/source/browse/trunk/any23-core/src/main/java/org/deri/any23/parser/NQuadsParser.java?r=1305

But Any23 is now apache incubating and has a growing community, so if it doesn't work right of the bat, we could try to get help there to fix their side of things.

Would you like to give this a shot? It would be a matter of changing the getTypesMap method to use Any23's iteration, rather than NxParser's. See:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/util/TypesLoader.scala#L82

Cheers,
Pablo


On Wed, Nov 28, 2012 at 6:46 PM, Rafa Haro <[email protected] <mailto:[email protected]>> wrote:

    Hi,

    I finally have generated the indexes for Spanish. Checking them
    with Luke, I have realized that my index /index-withSF-withTypes/
    doesn't contain the field Type. Apparently, the AddTypesToIndex
    launcher has been executed without any error. Just this warnings:

    INFO] launcher 'AddTypesToIndex' selected =>
    org.dbpedia.spotlight.lucene.index.AddTypesToIndex
     INFO 2012-11-28 12:40:22,470 main [IndexingConfiguration] -
    Loading configuration file ../conf/indexing.properties
     INFO 2012-11-28 12:40:22,932 main
    [MergedOccurrencesContextSearcher] - Using index at:
    
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF
    lockFactory=org.apache.lucene.store.NativeFSLockFactory@7a06cf15
     INFO 2012-11-28 12:40:24,114 main [IndexEnricher] - Analyzer
    class: class org.apache.lucene.analysis.es.SpanishAnalyzer
     INFO 2012-11-28 12:40:24,219 main [TypesLoader$] - Loading types
    map...
    warning on line 1 # started 2012-06-04T14:02:57Z : cannot parse
    0th element: # started 2012-06-04T14:02:57Z
    warning in NqParser.next on line 2364225 # <BAD URI: Illegal
    character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://dbpedia.org/ontology/Film>
    <http://dbpedia.org/ontology/Film> . : cannot parse 0th element: #
    <BAD URI: Illegal character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://dbpedia.org/ontology/Film>
    <http://dbpedia.org/ontology/Film> .
    warning in NqParser.next on line 2364226 # <BAD URI: Illegal
    character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://schema.org/Movie> <http://schema.org/Movie> . : cannot
    parse 0th element: # <BAD URI: Illegal character in path at index
    40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://schema.org/Movie> <http://schema.org/Movie> .
    warning in NqParser.next on line 2364227 # <BAD URI: Illegal
    character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://dbpedia.org/ontology/Work>
    <http://dbpedia.org/ontology/Work> . : cannot parse 0th element: #
    <BAD URI: Illegal character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://dbpedia.org/ontology/Work>
    <http://dbpedia.org/ontology/Work> .
    warning in NqParser.next on line 2364228 # <BAD URI: Illegal
    character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://schema.org/CreativeWork> <http://schema.org/CreativeWork>
    . : cannot parse 0th element: # <BAD URI: Illegal character in
    path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://schema.org/CreativeWork> <http://schema.org/CreativeWork> .
    warning in NqParser.next on line 2364229 # <BAD URI: Illegal
    character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/2002/07/owl#Thing>
    <http://www.w3.org/2002/07/owl#Thing> . : cannot parse 0th
    element: # <BAD URI: Illegal character in path at index 40:
    
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
    
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
    <http://www.w3.org/2002/07/owl#Thing>
    <http://www.w3.org/2002/07/owl#Thing> .
    warning in NqParser.next on line 2725295 # completed
    2012-06-04T14:31:53Z : cannot parse 0th element: # completed
    2012-06-04T14:31:53Z
     INFO 2012-11-28 12:41:07,523 main [TypesLoader$] - Done. Loaded
    2202361 types.
     INFO 2012-11-28 12:41:07,530 main [IndexEnricher] - Adding types
    to  index
    
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF-withTypes
    lockFactory=org.apache.lucene.store.NativeFSLockFactory@458d6f3d...
     INFO 2012-11-28 12:41:07,612 main [IndexEnricher] - processed 0
    documents.
     INFO 2012-11-28 12:41:09,190 main [IndexEnricher] - processed
    1000 documents.
    ..........................
    ............................

    The process continues until process 870019 documents, but then the
    field doesn't exist.

    Anyone knows what can be happening?

    Thanks in advance

    This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

    Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.


    
------------------------------------------------------------------------------
    Keep yourself connected to Go Parallel:
    INSIGHTS What's next for parallel hardware, programming and
    related areas?
    Interviews and blogs by thought leaders keep you ahead of the curve.
    http://goparallel.sourceforge.net
    _______________________________________________
    Dbp-spotlight-users mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users




--

Pablo N. Mendes
http://pablomendes.com



This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to