Hey Chal, First of all thanks you very much for the contribution! I have some observations :
*Model Downloading* Taking the look to the way you provide the user with the models, I can see there is a shell script to download very specific english models. It would be great having the possibility to configure the model to use in the connector config UI . In particular I see two possibilities : 1) you provide a select list per model required and then automatically you download the model and install it 2) you provide the user with the possibility of uploading the model he/she wants to use ( more flexible, but the user will need to download a model on his own) In my opinion is really important to keep the transformation connector flexible, able to work with different languages and models. *Text enrichment* Taking a look to the code I see in here a really strong assumption : String textContent = new String(bytes); This means you assume the only input possible is plain text. Actually as we know we have the binary there, not necessary a plain string. I think we need to specify the Tika Transformer to be a requirement for this connector. Furthermore I would suggest the possibility for the user to select the list of input fields to be considered to be the source of the extraction. e.g. I can configure my extraction to happen from title,text and description. Of course it is required a Transformer Connector to happen before the OpenNLP one, to provide those fields. These are quick considerations after a first look to the code, happy to discuss and help further :) Cheers On 18 November 2015 at 13:47, Karl Wright <[email protected]> wrote: > Thanks, Chalitha, for contributing this! > > I hope to have a look at the code also, but it won't happen until next week > I'm afraid. > > Karl > > > On Wed, Nov 18, 2015 at 7:44 AM, Rafa Haro <[email protected]> wrote: > > > Hi Chalitha! > > > > > > > > > > Awesome!. I will take a look to this as soon as possible. > > > > > > > > > > Cheers, > > > > Rafa > > > > On Wed, Nov 18, 2015 at 1:22 PM, chalitha udara Perera > > <[email protected]> wrote: > > > > > Hi All, > > > I have worked on a OpenNLP based transformation connector for some > > > requirement. Given a document it extracts named entities such as > people, > > > locations and organisations and add those as metadata to repository > > > document. > > > If you think this will be useful for the community, I would like to > > > contribute it to manifoldcf. > > > Connector code is available here [1]. > > > [1] https://github.com/ChalithaUdara/OpenNLP-Manifold-Connector > > > Thanks, > > > Chalitha > > > -- > > > J.M Chalitha Udara Perera > > > *Department of Computer Science and Engineering,* > > > *University of Moratuwa,* > > > *Sri Lanka* > > > -- -------------------------- Benedetti Alessandro Visiting card : http://about.me/alessandro_benedetti "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
