Re: [Dbp-spotlight-users] XML input format

Alex Olieman Mon, 01 Jun 2015 05:50:37 -0700

Hi Pajolma,

Yes, I have been in a similar situation. I'm not sure if there is a moreconvenient solution (from the Java/Scala code), but I ended up parsing,annotating, and rewriting the XML. If you already intend to makeannotations a part of your XML schema, neatly annotating each elementwith correct offsets is quite trivial.

See this XML<https://www.dropbox.com/s/4uxl7zw6ffxnp88/nl.proc.ob.d.h-tk-20042005-5970-5973.xml?dl=0>for an example of what my output looks like. It includes annotationsfrom multiple systems, so to check out only those generated by DBpSpotlight, just search the file for "Spotlight". The original XML source(without annotations; for comparison) can be found here:http://resolver.politicalmashup.nl/nl.proc.ob.d.h-tk-20042005-5970-5973.xml

I'm currently cleaning the code I use to do this, and will release a(partly documented) version within two weeks. It's in Python, but may beuseful as reference implementation if you'd like to do the same in Java.

If this approach is too much work: have you tried just annotating yourraw XML files, without removing any markup? I've done this before withHTML and XML and could get a pretty decent result by ignoring a fewentities that correspond to common tag and attribute names.


Cheers,
Alex

On 28-5-2015 13:41, Pajolma Rupi wrote:

Dear all,
I am interested in running Spotlight with an XML input file formatwith the objective of enriching the content with semantic information.From what I've experienced until now it seems like such format is notsupported and that only a plain text format is supported. Am Icorrect? (I'm using the code here for processing text files:https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/eval/src/main/java/org/dbpedia/spotlight/evaluation/external/DBpediaSpotlightClient.java#L90 )
Has anybody run into such a problem already?
I can of course get the text content out of the XML file (say it willproduce a new plain text file) and pass this text content to Spotlightbut then I would have that:1- the offset I would get from running the Spotlight won't be the sameas the offset in the original XML file2- the enriching process will get more complicated due to thedifferent offsets (XML file vs plain text file)
Thank you in advance,
Pajolma

*/Pajolma RUPI/*

Research and Development Engineer

Service de l'e-Information Scientifique et Multimédia (SEISM)
Research Centre INRIA Grenoble - Rhône-Alpes

/655 Avenue de l'Europe/

/38330 Montbonnot-Saint-Martin/

/France/



------------------------------------------------------------------------------


_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

------------------------------------------------------------------------------

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] XML input format

Reply via email to