Hi Pablo,

I'm getting similar errors while parsing some Wikipedia articles. For instance:

INFO 2012-11-28 10:11:22,555 main [FileOccurrenceSource$] - saved 11200000 occurrences nov 28, 2012 10:11:43 AM org.dbpedia.extraction.sources.WikipediaDumpParser readPage Advertencia: Error processing page title=S/mileage;ns=0/Main/;language:wiki=es,locale=es org.dbpedia.extraction.wikiparser.impl.simple.TooManyErrorsException: Too many errors at '|align="center"| 11 || {{nihongo|[[Suki yo, Junjou Hankouki.]] (??????????) || ' (line: 120) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:111) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTableCell(SimpleWikiParser.scala:575) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTableRow(SimpleWikiParser.scala:557) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTable(SimpleWikiParser.scala:536) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:268) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseProperty(SimpleWikiParser.scala:468) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTemplate(SimpleWikiParser.scala:446) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:264) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTableCell(SimpleWikiParser.scala:575) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTableRow(SimpleWikiParser.scala:557) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseTable(SimpleWikiParser.scala:536) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.createNode(SimpleWikiParser.scala:268) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.parseUntil(SimpleWikiParser.scala:194) at org.dbpedia.extraction.wikiparser.impl.simple.SimpleWikiParser.apply(SimpleWikiParser.scala:69) at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1.apply(AllOccurrenceSource.scala:82) at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource$$anonfun$foreach$1.apply(AllOccurrenceSource.scala:80) at org.dbpedia.extraction.sources.WikipediaDumpParser.readPage(WikipediaDumpParser.java:253) at org.dbpedia.extraction.sources.WikipediaDumpParser.readPages(WikipediaDumpParser.java:179) at org.dbpedia.extraction.sources.WikipediaDumpParser.readDump(WikipediaDumpParser.java:137) at org.dbpedia.extraction.sources.WikipediaDumpParser.run(WikipediaDumpParser.java:108) at org.dbpedia.extraction.sources.XMLReaderSource.foreach(XMLSource.scala:57) at org.dbpedia.spotlight.io.AllOccurrenceSource$AllOccurrenceSource.foreach(AllOccurrenceSource.scala:80) at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58) at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58) at org.dbpedia.spotlight.filter.Filter$FilteredOccs.foreach(Filter.scala:58) at org.dbpedia.spotlight.io.FileOccurrenceSource$.writeToFile(FileOccurrenceSource.scala:57) at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia$.main(ExtractOccsFromWikipedia.scala:82) at org.dbpedia.spotlight.lucene.index.ExtractOccsFromWikipedia.main(ExtractOccsFromWikipedia.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at scala_maven_executions.MainHelper.runMain(MainHelper.java:164)
at scala_maven_executions.MainWithArgsInFile.main(MainWithArgsInFile.java:26)

I suppose that this should be a completely different problem but, is it possible to try to fix it too? Any clue?

Regards
El 29/11/12 13:25, Pablo N. Mendes escribió:

Hi Rafa,
I don't think so. The warning you got was very specific: "Illegal character in path at index 40" which is where the "\u5BCC" occurs. See:

warning in NqParser.next on line 2364225 # <BAD URI: Illegal character in path at index 40: http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1 <http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>

Great. We'd love to have you send us a pull request with the fixes for this. Max has produced a pretty detailed guide on how to contribute that takes all the roadblocks away from your path:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Contributing

Cheers,
Pablo



On Thu, Nov 29, 2012 at 1:21 PM, Rafa Haro <[email protected] <mailto:[email protected]>> wrote:

    Hi Pablo,

    Thanks for your response. Of course I don't mind to change it.
    Anyway, is it possible that the issue had been produced by having
    the "spanish" namespaces (http://es.dbpedia.org/resource/ and
    http://es.dbpedia.org/ontology/) instead of the default ones??

    Thanks. Regards

    El 29/11/12 12:04, Pablo N. Mendes escribió:

    Hi Rafa,
    It looks like NxParser (or our code based on it) is failing to
    parse the unicode characters in your URIs. We now have Any23 in
    our dependencies, and we thought about losing NxParser for good.
    But I am not sure they will handle unicode either, see:
    
http://code.google.com/p/any23/source/browse/trunk/any23-core/src/main/java/org/deri/any23/parser/NQuadsParser.java?r=1305

    But Any23 is now apache incubating and has a growing community,
    so if it doesn't work right of the bat, we could try to get help
    there to fix their side of things.

    Would you like to give this a shot? It would be a matter of
    changing the getTypesMap method to use Any23's iteration, rather
    than NxParser's. See:
    
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/util/TypesLoader.scala#L82

    Cheers,
    Pablo


    On Wed, Nov 28, 2012 at 6:46 PM, Rafa Haro <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,

        I finally have generated the indexes for Spanish. Checking
        them with Luke, I have realized that my index
        /index-withSF-withTypes/ doesn't contain the field Type.
        Apparently, the AddTypesToIndex launcher has been executed
        without any error. Just this warnings:

        INFO] launcher 'AddTypesToIndex' selected =>
        org.dbpedia.spotlight.lucene.index.AddTypesToIndex
         INFO 2012-11-28 12:40:22,470 main [IndexingConfiguration] -
        Loading configuration file ../conf/indexing.properties
         INFO 2012-11-28 12:40:22,932 main
        [MergedOccurrencesContextSearcher] - Using index at:
        
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSFlockFactory=org.apache.lucene.store.NativeFSLockFactory@7a06cf15
         INFO 2012-11-28 12:40:24,114 main [IndexEnricher] - Analyzer
        class: class org.apache.lucene.analysis.es.SpanishAnalyzer
         INFO 2012-11-28 12:40:24,219 main [TypesLoader$] - Loading
        types map...
        warning on line 1 # started 2012-06-04T14:02:57Z : cannot
        parse 0th element: # started 2012-06-04T14:02:57Z
        warning in NqParser.next on line 2364225 # <BAD URI: Illegal
        character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://dbpedia.org/ontology/Film>
        <http://dbpedia.org/ontology/Film> . : cannot parse 0th
        element: # <BAD URI: Illegal character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://dbpedia.org/ontology/Film>
        <http://dbpedia.org/ontology/Film> .
        warning in NqParser.next on line 2364226 # <BAD URI: Illegal
        character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://schema.org/Movie> <http://schema.org/Movie> . :
        cannot parse 0th element: # <BAD URI: Illegal character in
        path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://schema.org/Movie> <http://schema.org/Movie> .
        warning in NqParser.next on line 2364227 # <BAD URI: Illegal
        character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://dbpedia.org/ontology/Work>
        <http://dbpedia.org/ontology/Work> . : cannot parse 0th
        element: # <BAD URI: Illegal character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://dbpedia.org/ontology/Work>
        <http://dbpedia.org/ontology/Work> .
        warning in NqParser.next on line 2364228 # <BAD URI: Illegal
        character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://schema.org/CreativeWork>
        <http://schema.org/CreativeWork> . : cannot parse 0th
        element: # <BAD URI: Illegal character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://schema.org/CreativeWork>
        <http://schema.org/CreativeWork> .
        warning in NqParser.next on line 2364229 # <BAD URI: Illegal
        character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/2002/07/owl#Thing>
        <http://www.w3.org/2002/07/owl#Thing> . : cannot parse 0th
        element: # <BAD URI: Illegal character in path at index 40:
        
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
        
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
        <http://www.w3.org/2002/07/owl#Thing>
        <http://www.w3.org/2002/07/owl#Thing> .
        warning in NqParser.next on line 2725295 # completed
        2012-06-04T14:31:53Z : cannot parse 0th element: # completed
        2012-06-04T14:31:53Z
         INFO 2012-11-28 12:41:07,523 main [TypesLoader$] - Done.
        Loaded 2202361 types.
         INFO 2012-11-28 12:41:07,530 main [IndexEnricher] - Adding
        types to  index
        
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF-withTypeslockFactory=org.apache.lucene.store.NativeFSLockFactory@458d6f3d...
INFO 2012-11-28 12:41:07,612 main [IndexEnricher] - processed 0 documents. INFO 2012-11-28 12:41:09,190 main [IndexEnricher] - processed 1000 documents.
        ..........................
        ............................

        The process continues until process 870019 documents, but
        then the field doesn't exist.

        Anyone knows what can be happening?

        Thanks in advance

        This message should be regarded as confidential. If you have received 
this email in error please notify the sender and destroy it immediately. 
Statements of intent shall only become binding when confirmed in hard copy by 
an authorised signatory.

        Zaizi Ltd is registered in England and Wales with the registration 
number 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam 
Road, London W10 5JJ, UK.


        
------------------------------------------------------------------------------
        Keep yourself connected to Go Parallel:
        INSIGHTS What's next for parallel hardware, programming and
        related areas?
        Interviews and blogs by thought leaders keep you ahead of the
        curve.
        http://goparallel.sourceforge.net
        _______________________________________________
        Dbp-spotlight-users mailing list
        [email protected]
        <mailto:[email protected]>
        https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users




--
    Pablo N. Mendes
    http://pablomendes.com


    This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

    Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.




--

Pablo N. Mendes
http://pablomendes.com



This message should be regarded as confidential. If you have received this 
email in error please notify the sender and destroy it immediately. Statements 
of intent shall only become binding when confirmed in hard copy by an 
authorised signatory.

Zaizi Ltd is registered in England and Wales with the registration number 
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
London W10 5JJ, UK.
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to