We use sequence classifiers in the temporal project to extract temporal 
expressions. One way to do it is called BIO tagging, where each element in the 
sequence is classified as Begin, Inside, or Outside of some span by a standard 
classifier like SVM. Another way is to use an explicit sequence model like HMM 
or CRF (also using BIO labels but finding a globally optimal tagging). We use 
the ClearTK library for its feature extraction and interfaces with machine 
learning libraries. There are examples of both kinds of model in 
ctakes-temporal.

TimeAnnotator uses an SVM BIO tagger:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/TimeAnnotator.java

CrfTimeAnnotator uses a CRF BIO tagger:
http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/CRFTimeAnnotator.java

This is very high level and there are a lot of details to do this right, 
foremost is the importance of gold standard labeled training data. The classes 
above are trained on THYME data.

Tim

On 05/22/2015 02:11 AM, Soumya Shree wrote:
Hi folks,

I am new to Ctakes & NLP concept . I need to train my application in a manner 
that I should be able to predict for sequence of words. Do we have any API 
which helps to do that or any concept with which we can leverage the same. Also 
I need to create a train bin file so I need to know the structure for the 
training text so that I can validate it and convert it in bin file successfully.

Thanks & Regards,
Soumya Shree
[cid:[email protected]]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.citiustech.com_&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=oSuaCBl8b2QOPxweDxwxhGb6J-_g8vKbStY0y6Ilaig&e=>[cid:[email protected]]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_company_80661-3Ftrk-3Dtyah&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=M-US_dQ_gSy2vYjSiKNDiC_d1oki4uzu9B1HNEhTmGI&e=>
  [cid:[email protected]] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_-23-21_CitiusTech&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=eWT3h8Lz8FLBm7R2K5EVTjzYdGyg1J8f0iYDvSYwH44&e=>
   [cid:[email protected]] 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_pages_CitiusTech_124740167627560-3Fsk-3Dwall&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=jC3dacfTgaS1SlkOo6_FuOOvilxb24wCIFXNGdbEWpg&e=>

===========================================================================================================================================================================================
 DISCLAIMER: The information contained in this message (including any 
attachments) is confidential and may be privileged. If you have received it by 
mistake please notify the sender by return e-mail and permanently delete this 
message and any attachments from your system. Any dissemination, use, review, 
distribution, printing or copying of this message in whole or in part is 
strictly prohibited. Please note that e-mails are susceptible to change. 
CitiusTech shall not be liable for the improper or incomplete transmission of 
the information contained in this communication nor for any delay in its 
receipt or damage to your system. CitiusTech does not guarantee that the 
integrity of this communication has been maintained or that this communication 
is free of viruses, interceptions or interferences. 
====================================================================================================================================================================

Reply via email to