Re: Using Apache UIMA for processing Malay texts

Richard Eckart de Castilho Sat, 12 Mar 2016 01:07:29 -0800

The UIMA project itself only offers a handful of annotators and a number of 
them are language-agnostic, e.g. TikaAnnotator or ConceptMapper. UIMA Ruta is a 
rule-based processing engine which should allow you to write rules to extract 
information from Malay text.Some like the service-based ones (Alchemy, Calais) 
should support whatever languages the respective services support. The HMM 
Tagger comes with documentation on how to train it on your own data [1].

There are various third-party component collections for UIMA: ClearTK, DKPro 
Core, U-Compare, JCore - I am not aware that any of these has explicit support 
for Malay. But if you have e.g. trained your own OpenNLP or Stanford CoreNLP 
models for Malay or if you can find such models on the internet, you should be 
able to use them with the respective wrappers in the component collections 
mentioned above. Again, I am not aware of any freely available pre-trained 
models for Malay - but I never searched for them explicitly.

Best,

-- Richard

[1] 
https://uima.apache.org/d/uima-addons-current/Tagger/hmmTaggerUsersGuide.html

> On 09.03.2016, at 14:28, aliff faisal <[email protected]> wrote:
> 
> Hello!Sorry of my English - It's bad..
> I would like to use Apache UIMA Annotators and other UIMA Tools for 
> processing Malay language texts.. It's search of statistics term, dates, 
> regions in text documents.
> 
> So, I would like to ask - what Annotators supports Malay language? 
> or, Can u provide me some documentation or user guide for developing 
> annotator that can process other language texts beside EnglishThank You
> Your faithfully,
> Aliff

Re: Using Apache UIMA for processing Malay texts

Reply via email to