Re: [Kim-discussion] to populate KB

borislav popov Mon, 02 Feb 2009 04:53:29 -0800

Hi Valentina,

In your example it seems that the content is already in a semi-structured form, since these tags exist.XML or smth. In this we can provide you with an example how totransform these tags into semantic annotations with respect to yourontology. Here is a draft of the approach (details could becommunicated in the kim discussion forum in CC):

- ontology: best is if you do it as an extension of Proton, our basicupper level ontology; if you have already an ontology whichspecifically for this domain, it won't be hard to map/align it withPROTON. in short answers to WHY? because you already have a lot ofnamed entity classes and relationships there; you'll be able to reuseothers' work & so forth. generally you do not need to be terriblyversed in ontology design to achieve such an extension.

given you have the ontology, you need to go through a semanticannotation process to align your tags with the ontology. If you haveproper information extraction pipelines, you can do the same butwithout the initial tags. Of course wherever you have semi-structuredcontent - it is wise to use it.


- semantic annotation of pre-existing tags:
        --in this case you do not seem to need the entire default IE pipeline;

-- using the population tool one can pass XML or HTML and the tagswill be identified as original markup annotations;-- with simple pattern-matching grammar rules one can transform theseannotations into the proper form needed, i.e. if you want to put aspecific class from your ontology when you have a tag singer, etc.Additionally you can identify groups of these annotations and definerelationships between them: i.e. being able to express, that not onlyFix u is a song, but it is also by this particular artist. So you canmodel directly in the semantic repository that there is a singer,there is a song and these two are related. However i would suggest youdo the entity part first and then the relationships.


- identification / instantiation of the entities

-- as mentions in the text are mere references to real-worldentities, our approach is to have instances mirroring the real-worldones in the semantic database. They are actually instances of classesin your ontology and have an identifier URI, label, and otherdescriptive properties.-- it is important to decide whether there is a place to obtain allthese instances and model them in the instance base automatically, oryou'd like to extract and model them on the basis of the references inthe text. If you choose the latter you will need to have certaincreation mechanisms which add to the instance base the proper triples.Like:Johny Rotten is a Singer. Psychopath's Path is a Song. PP iscomposed/performed by JR. and so forth. of course in a bit uglier andmore formal expression.The alternative is to go on and use our instance generationmechanisms included in the default pipeline, which, given you provideat least a class URI for the annotation, will generate the instancesautomatically. It won't be that simple for the relationships, whereyou can add properties as a post-processing, after the entities havebeen instantiated. i.e. you can add a relationship between PP and JRonly when you know the URIs of these two.


hth,
borislav




On Feb 2, 2009, at 2:25 PM, Valentina De Vivo wrote:

Sorry,

is it possible in KIM, annotate words that are enclosed between tags?
For examples (…I have not a better idea) I would like to annotate the
following:
<singer> Laura Pausini </singer>
<song> Invece no </song>
<singer> Coldplay </singer>
<song> Fix you </song>

...so “Coldplay” is a singer and “Fix you” is a song.

Have you any advice about this, if in my ontology there are entities“singer”

and “song”?

Thanks again,
Valentina


_______________________________________________
Kim-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion

Re: [Kim-discussion] to populate KB

Reply via email to