Hi Pablo,
Thanks for your response. Of course I don't mind to change it. Anyway,
is it possible that the issue had been produced by having the "spanish"
namespaces (http://es.dbpedia.org/resource/ and
http://es.dbpedia.org/ontology/) instead of the default ones??
Thanks. Regards
El 29/11/12 12:04, Pablo N. Mendes escribió:
Hi Rafa,
It looks like NxParser (or our code based on it) is failing to parse
the unicode characters in your URIs. We now have Any23 in our
dependencies, and we thought about losing NxParser for good. But I am
not sure they will handle unicode either, see:
http://code.google.com/p/any23/source/browse/trunk/any23-core/src/main/java/org/deri/any23/parser/NQuadsParser.java?r=1305
But Any23 is now apache incubating and has a growing community, so if
it doesn't work right of the bat, we could try to get help there to
fix their side of things.
Would you like to give this a shot? It would be a matter of changing
the getTypesMap method to use Any23's iteration, rather than
NxParser's. See:
https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/core/src/main/scala/org/dbpedia/spotlight/util/TypesLoader.scala#L82
Cheers,
Pablo
On Wed, Nov 28, 2012 at 6:46 PM, Rafa Haro <[email protected]
<mailto:[email protected]>> wrote:
Hi,
I finally have generated the indexes for Spanish. Checking them
with Luke, I have realized that my index /index-withSF-withTypes/
doesn't contain the field Type. Apparently, the AddTypesToIndex
launcher has been executed without any error. Just this warnings:
INFO] launcher 'AddTypesToIndex' selected =>
org.dbpedia.spotlight.lucene.index.AddTypesToIndex
INFO 2012-11-28 12:40:22,470 main [IndexingConfiguration] -
Loading configuration file ../conf/indexing.properties
INFO 2012-11-28 12:40:22,932 main
[MergedOccurrencesContextSearcher] - Using index at:
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF
lockFactory=org.apache.lucene.store.NativeFSLockFactory@7a06cf15
INFO 2012-11-28 12:40:24,114 main [IndexEnricher] - Analyzer
class: class org.apache.lucene.analysis.es.SpanishAnalyzer
INFO 2012-11-28 12:40:24,219 main [TypesLoader$] - Loading types
map...
warning on line 1 # started 2012-06-04T14:02:57Z : cannot parse
0th element: # started 2012-06-04T14:02:57Z
warning in NqParser.next on line 2364225 # <BAD URI: Illegal
character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/Film> . : cannot parse 0th element: #
<BAD URI: Illegal character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Film>
<http://dbpedia.org/ontology/Film> .
warning in NqParser.next on line 2364226 # <BAD URI: Illegal
character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Movie> <http://schema.org/Movie> . : cannot
parse 0th element: # <BAD URI: Illegal character in path at index
40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/Movie> <http://schema.org/Movie> .
warning in NqParser.next on line 2364227 # <BAD URI: Illegal
character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Work>
<http://dbpedia.org/ontology/Work> . : cannot parse 0th element: #
<BAD URI: Illegal character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://dbpedia.org/ontology/Work>
<http://dbpedia.org/ontology/Work> .
warning in NqParser.next on line 2364228 # <BAD URI: Illegal
character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/CreativeWork> <http://schema.org/CreativeWork>
. : cannot parse 0th element: # <BAD URI: Illegal character in
path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://schema.org/CreativeWork> <http://schema.org/CreativeWork> .
warning in NqParser.next on line 2364229 # <BAD URI: Illegal
character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2002/07/owl#Thing>
<http://www.w3.org/2002/07/owl#Thing> . : cannot parse 0th
element: # <BAD URI: Illegal character in path at index 40:
http://es.dbpedia.org/resource/Tomie__\u5BCC\u6C5F\u3000\u6700\u7D42\u7AE0\uFF5E\u7981\u65AD\u306E\u679C\u5B9F\uFF5E__1
<http://es.dbpedia.org/resource/Tomie__%5Cu5BCC%5Cu6C5F%5Cu3000%5Cu6700%5Cu7D42%5Cu7AE0%5CuFF5E%5Cu7981%5Cu65AD%5Cu306E%5Cu679C%5Cu5B9F%5CuFF5E__1>>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2002/07/owl#Thing>
<http://www.w3.org/2002/07/owl#Thing> .
warning in NqParser.next on line 2725295 # completed
2012-06-04T14:31:53Z : cannot parse 0th element: # completed
2012-06-04T14:31:53Z
INFO 2012-11-28 12:41:07,523 main [TypesLoader$] - Done. Loaded
2202361 types.
INFO 2012-11-28 12:41:07,530 main [IndexEnricher] - Adding types
to index
org.apache.lucene.store.MMapDirectory@/usr/local/spotlight/dbpedia_data/data/output/index-withSF-withTypes
lockFactory=org.apache.lucene.store.NativeFSLockFactory@458d6f3d...
INFO 2012-11-28 12:41:07,612 main [IndexEnricher] - processed 0
documents.
INFO 2012-11-28 12:41:09,190 main [IndexEnricher] - processed
1000 documents.
..........................
............................
The process continues until process 870019 documents, but then the
field doesn't exist.
Anyone knows what can be happening?
Thanks in advance
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
INSIGHTS What's next for parallel hardware, programming and
related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
<mailto:[email protected]>
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
--
Pablo N. Mendes
http://pablomendes.com
This message should be regarded as confidential. If you have received this
email in error please notify the sender and destroy it immediately. Statements
of intent shall only become binding when confirmed in hard copy by an
authorised signatory.
Zaizi Ltd is registered in England and Wales with the registration number
6440931. The Registered Office is 222 Westbourne Studios, 242 Acklam Road,
London W10 5JJ, UK.
------------------------------------------------------------------------------
Keep yourself connected to Go Parallel:
VERIFY Test and improve your parallel project with help from experts
and peers. http://goparallel.sourceforge.net
_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users