Hi Valentina,
In your example it seems that the content is already in a semi- structured form, since these tags exist. XML or smth. In this we can provide you with an example how to transform these tags into semantic annotations with respect to your ontology. Here is a draft of the approach (details could be communicated in the kim discussion forum in CC):

- ontology: best is if you do it as an extension of Proton, our basic upper level ontology; if you have already an ontology which specifically for this domain, it won't be hard to map/align it with PROTON. in short answers to WHY? because you already have a lot of named entity classes and relationships there; you'll be able to reuse others' work & so forth. generally you do not need to be terribly versed in ontology design to achieve such an extension.

given you have the ontology, you need to go through a semantic annotation process to align your tags with the ontology. If you have proper information extraction pipelines, you can do the same but without the initial tags. Of course wherever you have semi-structured content - it is wise to use it.

- semantic annotation of pre-existing tags:
        --in this case you do not seem to need the entire default IE pipeline;
-- using the population tool one can pass XML or HTML and the tags will be identified as original markup annotations; -- with simple pattern-matching grammar rules one can transform these annotations into the proper form needed, i.e. if you want to put a specific class from your ontology when you have a tag singer, etc. Additionally you can identify groups of these annotations and define relationships between them: i.e. being able to express, that not only Fix u is a song, but it is also by this particular artist. So you can model directly in the semantic repository that there is a singer, there is a song and these two are related. However i would suggest you do the entity part first and then the relationships.

- identification / instantiation of the entities
-- as mentions in the text are mere references to real-world entities, our approach is to have instances mirroring the real-world ones in the semantic database. They are actually instances of classes in your ontology and have an identifier URI, label, and other descriptive properties. -- it is important to decide whether there is a place to obtain all these instances and model them in the instance base automatically, or you'd like to extract and model them on the basis of the references in the text. If you choose the latter you will need to have certain creation mechanisms which add to the instance base the proper triples. Like: Johny Rotten is a Singer. Psychopath's Path is a Song. PP is composed/performed by JR. and so forth. of course in a bit uglier and more formal expression. The alternative is to go on and use our instance generation mechanisms included in the default pipeline, which, given you provide at least a class URI for the annotation, will generate the instances automatically. It won't be that simple for the relationships, where you can add properties as a post-processing, after the entities have been instantiated. i.e. you can add a relationship between PP and JR only when you know the URIs of these two.

hth,
borislav




On Feb 2, 2009, at 2:25 PM, Valentina De Vivo wrote:

Sorry,

is it possible in KIM, annotate words that are enclosed between tags?
For examples (…I have not a better idea) I would like to annotate the
following:
<singer> Laura Pausini </singer>
<song> Invece no </song>
<singer> Coldplay </singer>
<song> Fix you </song>

...so “Coldplay” is a singer and “Fix you” is a song.
Have you any advice about this, if in my ontology there are entities “singer”
and “song”?

Thanks again,
Valentina


_______________________________________________
Kim-discussion mailing list
[email protected]
http://ontotext.com/mailman/listinfo/kim-discussion

Reply via email to