Am 18.06.2013 14:05, schrieb Rukku:
We are new to UIMA framework.
We studying UIMA to see if we can use it to parse and extract information
from travel related emails (confirmation, cancellation). Information can be
Passenger names, Itinarary, flight details etc. and make an XML output.
We tried using UIMA and ended up using just the Regex components which we
thought we could have use plain Java libraries to acheive the same.
Any help in giving us some direction will be greatly appreciated.
A solution for this task depends (in my opinion) mainly on the
properties of the input and if there is labeled data. It's rather not a
question of architecture.
Some (incomplete) thoughts about UIMA-based approaches:
- You could train a CRF or something similar with ClearTK [1] if you
have enough labeled data.
- For simple NER, there are some models provided by DKPro [2].
- If you want to define some rules or patterns, then there is UIMA Ruta
(Rule-based Text Annotation) [3].
Best,
Peter
[1] https://code.google.com/p/cleartk/
[2]
https://docs.google.com/spreadsheet/pub?key=0ApGcdapz0xSYdGh2azY2ODMtZDRNczUySEZJUFpXM2c&single=true&gid=0&output=html
[3] http://uima.apache.org/ruta.html
Regards,