Hi Pei, Thank you very much for your answer.
I am looking for good corpuses and thinking about a new one with my group to train the ML-based models and I will look into the hard-coded rules in order to change them. AFAIK, the UMLS has a subset of the terms translated into Spanish which are correlated to the ones on the Spanish version of SNOMED CT. I will be sharing my doubts as well as my progress here in order to get cTAKES working in Spanish and hopefully other languages. Cheers, -- Roberto Costumero Moreno Laboratorio de Minería de Datos y Simulación (MIDAS) Centro de Tecnología Biomédica Universidad Politecnica de Madrid [email protected] Tlf: +34 91 336 4664 El 15/11/2013, a las 14:49, Chen, Pei <[email protected]> escribió: > Hi Roberto, > Welcome! > > In theory, in order to have cTAKES work in a different language, we would > just need to: > -Retrain the existing ML-based models for the language and code should just > work as is for > -Update any hard-coded rules > -Use the Spanish dictionary for concepts (I believe UMLS already has a > Spanish translation for some of their thesauruses). > I think it would awesome to have cTAKES work with multiple languages > including Spanish! > Actually, a lot of folks have been asking about cTAKES models in different > languages. > The challenging thing with the supervised machine learning methods is that > we'll have to rely on local domain experts to create the gold standard for > training. > There is a group that may be contributing retrained models for cTAKES to work > in French. > Others can feel free to chime in... > > --Pei > >> -----Original Message----- >> From: Roberto Costumero Moreno [mailto:[email protected]] >> Sent: Thursday, November 14, 2013 5:43 AM >> To: [email protected] >> Subject: cTAKES Translation >> >> Hello everyone, >> >> My name is Roberto Costumero and I am working for the Technical University >> of Madrid in Spain doing my Ph.D. studies and I am new to this list, so I am >> introducing myself and posting some doubts I have. >> >> We are currently involved in a project together with several hospitals and we >> are working closely with them into getting to know their necessities in order >> to build an application for them to use the knowledge of their clinical >> notes, >> imaging among other things. >> >> We have been looking for different projects to see which one will fits our >> needs and, of course, which will we will share our investigations with. Among >> the different projects we have seen in the field of clinical text analysis we >> think that cTAKES is the best one out there and it is very well structured >> and >> organized, but the main problem we are facing is that every clinical text- >> based NLP project is developed for English and we will be working with >> Spanish texts. >> >> We have already done some work for testing different algorithms translating >> them to Spanish to detect negation and context dependency but we would >> like to use a well-tested complete framework to work with, so we thought >> about cTAKES, so I have a couple of questions for you. >> >> - Does anyone know if someone is already working in translating cTAKES >> modules to work with other languages (Spanish in particular)? >> - Do you think it would be very difficult to do it because of any >> architectural >> design I am not currently aware of? >> - Do you think it would be a good line of development (for the cTAKES >> project) to extend cTAKES to work together into translating it to Spanish in >> this case? >> >> Thank you very much in advance for your help. >> >> Sincerely, >> >> -- >> Roberto Costumero Moreno >> Laboratorio de Minería de Datos y Simulación (MIDAS) Centro de Tecnología >> Biomédica Universidad Politecnica de Madrid [email protected] >> Tlf: +34 91 336 4664 >
