[jira] [Updated] (STANBOL-739) Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart

Alessio Bosca (JIRA) Mon, 08 Oct 2012 00:44:05 -0700

     [ 
https://issues.apache.org/jira/browse/STANBOL-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Alessio Bosca updated STANBOL-739:
----------------------------------

    Attachment: myPatch.diff

The changes in this patch include:

Lemmatizer Engine Behaviour. I substituted the generic
hasMorphologicalFeature property with specific ones (hasGender, hasNumber,
hasTense, etc etc) taken from Olia ontology
Olia is lacking a specific property for the part of speech (pos) and since
the other morphological properties in Olia (hasGender, hasNumber, etc)
requires as a domain a pos class I decided to model the pos annotation with a 
isA

I changed the test on the full morphoanalysis and checked for specific features 
(lemma,pos, gender, number) of a given  known input (an italian word: casa 
(house))

I couldn't find anything more standard for the lemma therefore I left the 
custom hasLemma property used so far.

The changes in the code are

Changes in nlp.pos

-LexicalCategory:
    -Added Numeral, Clitic, ProperNoun (from Olia)

Changes in nlp.morpho

-Case:
    -Corrected typo (nstrumentel -> Instrumental)
-Added enum for features: Person, VerbMood
-Renamed Number enum to NumberFeature
-Added Tag classes for morpho features enums (Gender, Tense, Person, ...)

Changes in celi package

Test
-modified validateMorphoFeatureProperty in Lemmatizer test. Added TERM
constant to use as input for the full morpho analysys test

Src
-added CeliMorphoFeatures that groups the morphological features managed by
CELI engine , renamed and updated CeliTagsetRegistry

                
> Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart
> ----------------------------------------------------------------------
>
>                 Key: STANBOL-739
>                 URL: https://issues.apache.org/jira/browse/STANBOL-739
>             Project: Stanbol
>          Issue Type: Sub-task
>            Reporter: Rupert Westenthaler
>         Attachments: myPatch.diff
>
>
> The CELI Lemmatizer enhancement engine currently writes its results directly 
> to the metadata of the ContentItem. As the new AnalyzedText content part is 
> much better suited to represent those data this Engine should be adopted to 
> use the new content part.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (STANBOL-739) Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart

Reply via email to