You might also be interested in apache uima which is a popular text mining platform.
Mark On Dec 7, 2013 1:49 AM, "Valentin Waeselynck" <valentinwaesely...@yahoo.fr> wrote: > Thanks to all for your interest! > > The code examples are on their way, I'm trying to make them as diverse as > possible. I'll let you know as soon as they're ready. > > > Thanks for telling me about tika, Oliver, it's very interesting! An > algorithm that tries to extract the meaning of a variety of documents could > typically be a combination of tika and the Laboratory Toolkit. > > However, the Laboratory Toolkit is less specialized (in fact, it's not > specialized at all) and less concrete. It is similar in its genericity and > in the nature of its benefits to, for example, the Executor API in > java.concurrent. As the Executor API lets you think and design concurrent > algorithms in terms of tasks and executors, the Laboratory Toolkit lets you > think and design some other (I haven't found a satisfying description yet) > algorithms in terms of analyses and laboratories. > > Bests, > > > Valentin WAESELYNCK > Étudiant en 3° année à l'École Polytechnique > valentin.waesely...@polytechnique.edu > +33 6 80 84 99 > 54 > > > > > Le Vendredi 6 décembre 2013 21h30, Oliver Heger < > oliver.he...@oliver-heger.de> a écrit : > > > > Am 05.12.2013 13:44, schrieb Valentin Waeselynck: > > Hello, and pleased to meet you, > > > > Thank you for your answer. > > > > I just asked for confirmation, and I do have full intellectual property > on this software. > > > > About the use cases : no problem, I'll include some code samples. As a > foreword, let's say it provides a convenient API for creating all sorts of > custom "information extraction" algorithms. > If the library is about information extraction, you may also want to > have a look at the Apache Tika project [1]. > > Oliver > > [1] http://tika.apache.org/ > > > > > As for the group of persons willing to maintain this : well, for the > moment, there is me. As this is a quite small toolkit, I think it's > sufficient, at least for a start. > > > > I'll start working towards the other requirements (maven + test > coverage) right away and let you know as soon as it's ready. > > > > > > > > Should I keep answering to the whole ML about this, or only to you? > > > > Best regards, > > > > > > Valentin WAESELYNCK > > Étudiant en 3° année à l'École Polytechnique > > valentin.waesely...@polytechnique.edu > > +33 6 80 84 99 54 > > > > > > > > > > Le Jeudi 5 décembre 2013 8h53, Benedikt Ritter <brit...@apache.org> a > écrit : > > > > Bonjour Valentin, > > > > > welcome to the ML. Good to hear that you've decided to join the open > source > > movement. > > > > First of all, it would really help, if you could elaborate some use cases > > for your library. You're talking about building algorithms. What kind of > > algorithms can be build with Laboratory Toolkit? Can you give some code > > examples (just create some gists at github that show the the use of > > Laboratory Toolkit)? > > > > There is an important requirement for any code to be incorporated into > the > > Apache code base: > > - the interlectual property (IP) of the code has to be owned completely > by > > the contributor. You said, that you've build the Laboratory Toolkit for a > > research project. Are you sure that you own the code? Or > is it the result > > of your work and thus is owned by your employer? > > > > At commons we have some additinal requirements: > > - There should be a group of people who is willing to maintain the code > > - Commons components should in general not depend on any other libraries > > - Commons uses maven as the main build tool, so there should be a maven > > build available > > - The code should have a good test coverage > > > > You have to figure the IP issue out on your own first. > > After that, if the community decides to accept this contribution, we can > > work on the commons requirements. > > > > Best regards and thank you, > > Benedikt > > > > > > > > 2013/12/4 Valentin Waeselynck <valentinwaesely...@yahoo.fr> > > > >> Hello to all, > >> > >> As part of a small research project (which combined techniques of > >> text-mining, machine-learning and natural language generation, not that > >> it's really relevant) I have come to design a small JavaSE library, > which > >> I'm for the moment calling the Laboratory Toolkit, for developing our > >> algorithms in a comfortable and flexible manner. > >> > >> I have found it to be quite generic and reusable, not tied to any > >> application domain, while still being rather accessible, and > small enough > >> to comprehend it easily. Therefore, I would like to propose it as a new > >> Apache Commons component. I would be very grateful if one of you could > >> tell me what steps I should follow for that purpose. > >> > >> I have uploaded it on Github : > >> https://github.com/vvvvalvalval/Laboratory-Toolkit.git. There you may > >> find the sources, the javadoc, and a small guide I have started to write > >> for it (also attached to this mail). > >> > >> Of course, I am very open to feedback and criticism on your behalf. The > >> last thing I want is to publish an immature or useless component; nor > do I > >> > take a positive answer from you for granted. > >> > >> If I have failed to follow the proper procedure to propose a new > candidate > >> component, it is not on purpose, and I apologize in advance. > >> > >> Whatever your reply, and since I have the chance, I would also like to > >> congratulate you for all your work. The Apache Commons components have > >> really been lifesavers to me, on many occasions. > >> > >> With best wishes, > >> > >> Valentin WAESELYNCK > >> Étudiant en 3° année à l'École Polytechnique > >> valentin.waesely...@polytechnique.edu > >> +33 6 80 84 99 54 > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > >> For additional commands, e-mail: dev-h...@commons.apache.org > > >> > > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org