We use sequence classifiers in the temporal project to extract temporal expressions. One way to do it is called BIO tagging, where each element in the sequence is classified as Begin, Inside, or Outside of some span by a standard classifier like SVM. Another way is to use an explicit sequence model like HMM or CRF (also using BIO labels but finding a globally optimal tagging). We use the ClearTK library for its feature extraction and interfaces with machine learning libraries. There are examples of both kinds of model in ctakes-temporal.
TimeAnnotator uses an SVM BIO tagger: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/TimeAnnotator.java CrfTimeAnnotator uses a CRF BIO tagger: http://svn.apache.org/repos/asf/ctakes/trunk/ctakes-temporal/src/main/java/org/apache/ctakes/temporal/ae/CRFTimeAnnotator.java This is very high level and there are a lot of details to do this right, foremost is the importance of gold standard labeled training data. The classes above are trained on THYME data. Tim On 05/22/2015 02:11 AM, Soumya Shree wrote: Hi folks, I am new to Ctakes & NLP concept . I need to train my application in a manner that I should be able to predict for sequence of words. Do we have any API which helps to do that or any concept with which we can leverage the same. Also I need to create a train bin file so I need to know the structure for the training text so that I can validate it and convert it in bin file successfully. Thanks & Regards, Soumya Shree [cid:[email protected]]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.citiustech.com_&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=oSuaCBl8b2QOPxweDxwxhGb6J-_g8vKbStY0y6Ilaig&e=>[cid:[email protected]]<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.linkedin.com_company_80661-3Ftrk-3Dtyah&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=M-US_dQ_gSy2vYjSiKNDiC_d1oki4uzu9B1HNEhTmGI&e=> [cid:[email protected]] <https://urldefense.proofpoint.com/v2/url?u=https-3A__twitter.com_-23-21_CitiusTech&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=eWT3h8Lz8FLBm7R2K5EVTjzYdGyg1J8f0iYDvSYwH44&e=> [cid:[email protected]] <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.facebook.com_pages_CitiusTech_124740167627560-3Fsk-3Dwall&d=BQMFAg&c=qS4goWBT7poplM69zy_3xhKwEW14JZMSdioCoppxeFU&r=Heup-IbsIg9Q1TPOylpP9FE4GTK-OqdTDRRNQXipowRLRjx0ibQrHEo8uYx6674h&m=kuGHx-ra_EcWRtZkicbnsvMr1jy8wc313rmHMRUu8j4&s=jC3dacfTgaS1SlkOo6_FuOOvilxb24wCIFXNGdbEWpg&e=> =========================================================================================================================================================================================== DISCLAIMER: The information contained in this message (including any attachments) is confidential and may be privileged. If you have received it by mistake please notify the sender by return e-mail and permanently delete this message and any attachments from your system. Any dissemination, use, review, distribution, printing or copying of this message in whole or in part is strictly prohibited. Please note that e-mails are susceptible to change. CitiusTech shall not be liable for the improper or incomplete transmission of the information contained in this communication nor for any delay in its receipt or damage to your system. CitiusTech does not guarantee that the integrity of this communication has been maintained or that this communication is free of viruses, interceptions or interferences. ====================================================================================================================================================================
