[
https://issues.apache.org/jira/browse/STANBOL-739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Alessio Bosca updated STANBOL-739:
----------------------------------
Attachment: myPatch.diff
The changes in this patch include:
Lemmatizer Engine Behaviour. I substituted the generic
hasMorphologicalFeature property with specific ones (hasGender, hasNumber,
hasTense, etc etc) taken from Olia ontology
Olia is lacking a specific property for the part of speech (pos) and since
the other morphological properties in Olia (hasGender, hasNumber, etc)
requires as a domain a pos class I decided to model the pos annotation with a
isA
I changed the test on the full morphoanalysis and checked for specific features
(lemma,pos, gender, number) of a given known input (an italian word: casa
(house))
I couldn't find anything more standard for the lemma therefore I left the
custom hasLemma property used so far.
The changes in the code are
Changes in nlp.pos
-LexicalCategory:
-Added Numeral, Clitic, ProperNoun (from Olia)
Changes in nlp.morpho
-Case:
-Corrected typo (nstrumentel -> Instrumental)
-Added enum for features: Person, VerbMood
-Renamed Number enum to NumberFeature
-Added Tag classes for morpho features enums (Gender, Tense, Person, ...)
Changes in celi package
Test
-modified validateMorphoFeatureProperty in Lemmatizer test. Added TERM
constant to use as input for the full morpho analysys test
Src
-added CeliMorphoFeatures that groups the morphological features managed by
CELI engine , renamed and updated CeliTagsetRegistry
> Migrate the Celi Lemmatizer Engine to use the AnalyzedText contentPart
> ----------------------------------------------------------------------
>
> Key: STANBOL-739
> URL: https://issues.apache.org/jira/browse/STANBOL-739
> Project: Stanbol
> Issue Type: Sub-task
> Reporter: Rupert Westenthaler
> Attachments: myPatch.diff
>
>
> The CELI Lemmatizer enhancement engine currently writes its results directly
> to the metadata of the ContentItem. As the new AnalyzedText content part is
> much better suited to represent those data this Engine should be adopted to
> use the new content part.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira