Re: [Dbp-spotlight-users] XML input format

Pajolma Rupi Wed, 17 Jun 2015 05:27:10 -0700

Dear all, 

I am reopening the same discussion thread because after a few investigations I 
realized that at least by using the "url" parameter, and giving an XML file as 
input, the service answers back with a not that bad result (at least from the 
few examples I took for testing). 
Here is an example of my call (the only difference is that I'm using my local 
instance instead, i.e. http://localhost:2222/rest/...): 
http://spotlight.dbpedia.org/rest/annotate/?url=http://raweb.inria.fr/rapportsactivite/RA2014/wimmics/wimmics.xml&confidence=0.3&support=100


Still, I can see that not all the XML nodes containing text were taken into 
consideration, meaning that the output text returned by the service doesn't 
include all the text content contained in the XML elements/attributes. From a 
first check it seems like only the tags existing in HTML are taken into 
consideration (<p> is taken into consideration but not <firstName>) but I'd 
like to be sure about it. 
Does anybody have some knowledge about the logic lying behind this processing? 
I would like to know what should I expect as a result from Spotlight when it is 
run with an XML file. 

@Alex 
Please let me know if your Python script dealing with the XML input format is 
already available. 

Thank you in advance, 
Pajolma 
----- Original Message -----

> From: "Pajolma Rupi" <[email protected]>
> To: "Alex Olieman" <[email protected]>
> Cc: [email protected]
> Sent: Tuesday, June 2, 2015 11:13:44 AM
> Subject: Re: [Dbp-spotlight-users] XML input format

> Hi Alex,
> Thank you for sharing your experience.
> I did try to annotate raw XML files but there is a considerable difference
> regarding the number of entities annotated in this raw file with respect to
> the text content version so I might be interested in the approach you
> followed. I will have a look at your code when it will be available. Thank
> you for mentioning its release.

> Best,
> Pajolma

> ----- Original Message -----

> > From: "Alex Olieman" <[email protected]>
> 
> > To: [email protected]
> 
> > Cc: "pajolma rupi" <[email protected]>
> 
> > Sent: Monday, June 1, 2015 2:33:47 PM
> 
> > Subject: Re: [Dbp-spotlight-users] XML input format
> 

> > Hi Pajolma,
> 

> > Yes, I have been in a similar situation. I'm not sure if there is a more
> > convenient solution (from the Java/Scala code), but I ended up parsing,
> > annotating, and rewriting the XML. If you already intend to make
> > annotations
> > a part of your XML schema, neatly annotating each element with correct
> > offsets is quite trivial.
> 

> > See the attached XML for an example of what my output looks like. It
> > includes
> > annotations from multiple systems, so to check out only those generated by
> > DBp Spotlight, just search the file for "Spotlight". The original XML
> > source
> > (without annotations; for comparison) can be found here:
> > http://resolver.politicalmashup.nl/nl.proc.ob.d.h-tk-20042005-5970-5973.xml
> 

> > I'm currently cleaning the code I use to do this, and will release a
> > (partly
> > documented) version within two weeks. It's in Python, but may be useful as
> > reference implementation if you'd like to do the same in Java.
> 

> > If this approach is too much work: have you tried just annotating your raw
> > XML files, without removing any markup? I've done this before with HTML and
> > XML and could get a pretty decent result by ignoring a few entities that
> > correspond to common tag and attribute names.
> 

> > Cheers,
> 
> > Alex
> 

> > On 28-5-2015 13:41, Pajolma Rupi wrote:
> 

> > > Dear all,
> > 
> 

> > > I am interested in running Spotlight with an XML input file format with
> > > the
> > > objective of enriching the content with semantic information.
> > 
> 
> > > From what I've experienced until now it seems like such format is not
> > > supported and that only a plain text format is supported. Am I correct?
> > > (I'm
> > > using the code here for processing text files:
> > > https://github.com/dbpedia-spotlight/dbpedia-spotlight/blob/master/eval/src/main/java/org/dbpedia/spotlight/evaluation/external/DBpediaSpotlightClient.java#L90
> > > )
> > 
> 
> > > Has anybody run into such a problem already?
> > 
> 

> > > I can of course get the text content out of the XML file (say it will
> > > produce
> > > a new plain text file) and pass this text content to Spotlight but then I
> > > would have that:
> > 
> 
> > > 1- the offset I would get from running the Spotlight won't be the same as
> > > the
> > > offset in the original XML file
> > 
> 
> > > 2- the enriching process will get more complicated due to the different
> > > offsets (XML file vs plain text file)
> > 
> 

> > > Thank you in advance,
> > 
> 
> > > Pajolma
> > 
> 

> > > Pajolma RUPI
> > 
> 

> > > Research and Development Engineer
> > 
> 

> > > Service de l'e-Information Scientifique et Multimédia (SEISM)
> > 
> 
> > > Research Centre INRIA Grenoble - Rhône-Alpes
> > 
> 

> > > 655 Avenue de l'Europe
> > 
> 

> > > 38330 Montbonnot-Saint-Martin
> > 
> 

> > > France
> > 
> 

> > > ------------------------------------------------------------------------------
> > 
> 

> > > _______________________________________________
> > 
> 
> > > Dbp-spotlight-users mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users
> > 
> 

> ------------------------------------------------------------------------------

> _______________________________________________
> Dbp-spotlight-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

------------------------------------------------------------------------------

_______________________________________________
Dbp-spotlight-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dbp-spotlight-users

Re: [Dbp-spotlight-users] XML input format

Reply via email to