Hi Rafa,
I don't think so. The warning you got was very specific: "Illegal character
in path at index 40"  which is where the "\u5BCC" occurs. See:

warning in NqParser.next on line 2364225 # <BAD URI: Illegal character in
path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
>

Great. We'd love to have you send us a pull request with the fixes for
this. Max has produced a pretty detailed guide on how to contribute that
takes all the roadblocks away from your path:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/wiki/Contributing

Cheers,
Pablo



On Thu, Nov 29, 2012 at 1:21 PM, Rafa Haro <[email protected]> wrote:

>  Hi Pablo,
>
> Thanks for your response. Of course I don't mind to change it. Anyway, is
> it possible that the issue had been produced by having the "spanish"
> namespaces (http://es.dbpedia.org/resource/ and
> http://es.dbpedia.org/ontology/) instead of the default ones??
>
> Thanks. Regards
>
> El 29/11/12 12:04, Pablo N. Mendes escribió:
>
>
> Hi Rafa,
> It looks like NxParser (or our code based on it) is failing to parse the
> unicode characters in your URIs. We now have Any23 in our dependencies, and
> we thought about losing NxParser for good. But I am not sure they will
> handle unicode either, see:
>
> http://code.google.com/p/any23/source/browse/trunk/any23-core/src/main/java/org/deri/any23/parser/NQuadsParser.java?r=1305
>
>  But Any23 is now apache incubating and has a growing community, so if it
> doesn't work right of the bat, we could try to get help there to fix their
> side of things.
>
>  Would you like to give this a shot? It would be a matter of changing the
> getTypesMap method to use Any23's iteration, rather than NxParser's. See:
>
> https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/util/TypesLoader.scala#L82
>
>  Cheers,
> Pablo
>
>
> On Wed, Nov 28, 2012 at 6:46 PM, Rafa Haro <[email protected]> wrote:
>
>>  Hi,
>>
>> I finally have generated the indexes for Spanish. Checking them with
>> Luke, I have realized that my index *index-withSF-withTypes* doesn't
>> contain the field Type. Apparently, the AddTypesToIndex launcher has been
>> executed without any error. Just this warnings:
>>
>> INFO] launcher 'AddTypesToIndex' selected =>
>> org.dbpedia.spotlight.lucene.index.AddTypesToIndex
>>  INFO 2012-11-28 12:40:22,470 main [IndexingConfiguration] - Loading
>> configuration file ../conf/indexing.properties
>>  INFO 2012-11-28 12:40:22,932 main [MergedOccurrencesContextSearcher] -
>> Using index at: 
>> org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@7a06cf15
>>  INFO 2012-11-28 12:40:24,114 main [IndexEnricher] - Analyzer class:
>> class org.apache.lucene.analysis.es.SpanishAnalyzer
>>  INFO 2012-11-28 12:40:24,219 main [TypesLoader$] - Loading types map...
>> warning on line 1 # started 2012-06-04T14:02:57Z : cannot parse 0th
>> element: # started 2012-06-04T14:02:57Z
>> warning in NqParser.next on line 2364225 # <BAD URI: Illegal character in
>> path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://dbpedia.org/ontology/Film> <http://dbpedia.org/ontology/Film> .
>> : cannot parse 0th element: # <BAD URI: Illegal character in path at index
>> 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://dbpedia.org/ontology/Film> <http://dbpedia.org/ontology/Film> .
>> warning in NqParser.next on line 2364226 # <BAD URI: Illegal character in
>> path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://schema.org/Movie> <http://schema.org/Movie> . : cannot parse 0th
>> element: # <BAD URI: Illegal character in path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://schema.org/Movie> <http://schema.org/Movie> .
>> warning in NqParser.next on line 2364227 # <BAD URI: Illegal character in
>> path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://dbpedia.org/ontology/Work> <http://dbpedia.org/ontology/Work> .
>> : cannot parse 0th element: # <BAD URI: Illegal character in path at index
>> 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://dbpedia.org/ontology/Work> <http://dbpedia.org/ontology/Work> .
>> warning in NqParser.next on line 2364228 # <BAD URI: Illegal character in
>> path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://schema.org/CreativeWork> <http://schema.org/CreativeWork> . :
>> cannot parse 0th element: # <BAD URI: Illegal character in path at index
>> 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://schema.org/CreativeWork> <http://schema.org/CreativeWork> .
>> warning in NqParser.next on line 2364229 # <BAD URI: Illegal character in
>> path at index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/2002/07/owl#Thing><http://www.w3.org/2002/07/owl#Thing>. 
>> : cannot parse 0th element: # <BAD URI: Illegal character in path at
>> index 40:
>> http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1>
>> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
>> <http://www.w3.org/2002/07/owl#Thing><http://www.w3.org/2002/07/owl#Thing>.
>> warning in NqParser.next on line 2725295 # completed 2012-06-04T14:31:53Z
>> : cannot parse 0th element: # completed 2012-06-04T14:31:53Z
>>  INFO 2012-11-28 12:41:07,523 main [TypesLoader$] - Done. Loaded 2202361
>> types.
>>  INFO 2012-11-28 12:41:07,530 main [IndexEnricher] - Adding types to
>> index 
>> org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF-withTypes
>> lockFactory=org.apache.lucene.store.NativeFSLockFactory@458d6f3d...
>>  INFO 2012-11-28 12:41:07,612 main [IndexEnricher] -   processed 0
>> documents.
>>  INFO 2012-11-28 12:41:09,190 main [IndexEnricher] -   processed 1000
>> documents.
>> ..........................
>> ............................
>>
>> The process continues until process 870019 documents, but then the field
>> doesn't exist.
>>
>> Anyone knows what can be happening?
>>
>> Thanks in advance
>>
>> This message should be regarded as confidential. If you have received this 
>> email in error please notify the sender and destroy it immediately. 
>> Statements of intent shall only become binding when confirmed in hard copy 
>> by an authorised signatory.
>>
>> Zaizi Ltd is registered in England and Wales with the registration number 
>> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
>> London W10 5JJ, UK.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Keep yourself connected to Go Parallel:
>> INSIGHTS What's next for parallel hardware, programming and related areas?
>> Interviews and blogs by thought leaders keep you ahead of the curve.
>> http://goparallel.sourceforge.net
>> _______________________________________________
>> Dbp-spotlight-users mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
>>
>>
>
>
>  --
>
>  Pablo N. Mendes
> http://pablomendes.com
>
>
>  This message should be regarded as confidential. If you have received this 
> email in error please notify the sender and destroy it immediately. 
> Statements of intent shall only become binding when confirmed in hard copy by 
> an authorised signatory.
>
> Zaizi Ltd is registered in England and Wales with the registration number 
> 6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road, 
> London W10 5JJ, UK.
>
>


-- 

Pablo N. Mendes
http://pablomendes.com
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel: 
VERIFY Test and improve your parallel project with help from experts 
and peers. http://goparallel.sourceforge.net
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Reply via email to