Hi, Is there a cTAKES project that may serve as an example on how the cTAKES community develops or how a project should look like? I learned that different people set up UIMA project in a quite different manner and I do not what to get inspired by "some sort of out-dated" approach in the cTAKES repo.
Are there restriction or preferences about the preprocessing components that should be used and the kind of "output" of the project. Components: On which components may the componetns rely: tokenizer, ... parser, ... dict lookup? "output": Should the project provide a pipeline or a single AE? More comments below. Am 03.11.2015 um 16:54 schrieb Azad Dehghan: >> >> >> Who else plans to provide patches for it? Just to avoid duplicate work >> and to coordnate the efforts ... >> > I would like to help with the translating JAPE to RUTA. You can already go ahead with the UIMA Ruta Workbench if you want, or wait until I set up the project with ruta integration. If any questions arise, just ask :-) > >> Is there a development dataset which was utilized for the initial >> development, and if yes, is it possible to contribute it too? >> > The data set is unfortunately not publicly available; i2b2 > <https://www.i2b2.org/NLP/DataSets/Main.php> typically releases the data > sets 12 months after a given challenge; this is done on an individual basis > and involve a Data Use Agreement. > > However, I will be able to conduct and coordinate the validation. > Ok, I'll investigate if we have already access to the dataset here. >> My first step would be: >> - set up a maven project >> - set up a development pipeline in a test (with cTAKES components >> replacing the previous ANNIE preprocessing) >> >> > >> But one item that we need to review is the 3rd party libs jars that >> were included to ensure compatibility. I’ll be sure to take a look at >> that over the next few weeks. >> >> —Pei >> >> > @Pei - once ANNIE components are replaced there is should not be a need to > worry about the 3rd party libs. > > Also, just a thought: we may want to create an independent component for > the Two Pass recognition (TwoPass.java) as this method have shown useful > for general NER on longitudinal data and surely useful independent of the > deid component. > > > Cheers, > Azad >