Farid: On Thursday 03 April 2008, Farid Bourennani wrote: > > 3) any additional tools (such as GUI) required to be developed to prove > > the my implementation? > > By GUI, I meant plotting tools in order to be able to visualize every > iteration of the implemented machine learning algorithm and validate the > final results graphically (eg. Gaussian VS Random Data).
Isabel Drost: There should be some automated means of validating your results that does not need human intervention. Where possible your algorithms should come with unit tests to prove that they work. Farid: > 5) It was also mentioned on the project that "Students are also encouraged > to work on projects related to their own machine learning research". Do > that means that all the algorithms used have to posted right a way. Isabel Drost: Well, I think you should make available all code and libraries that you use in a way that is compatible with both: The Apache Software License the code you develop during your project will be licensed under. And the license the libraries you want to use are licensed under. That said you need to make available everything that is necessary for your code to work correctly. It does not make a lot of sense to me, to include some java module that one can only use if one owns a Matlab license. Or worse, that only works with a library that is only available to your research lab. But I guess, that was clear to you already ;) Farid (NEW QUESTION) I understand that the complete code must be published; no doubt about it! With attention to the project Lucene-Mahot is very close to my research thesis. So, I am aiming for a possible publication with some Hybrid learning algorithms. Correct me please if I am wrong: My understanding is the algorithm implemented is entirely the property of Apache and I would be very happy to contribute to the community. This being sad, are the publications related to the Hybrid machine learning algorithms are still the property university? I am not talking about the code here only, not about the publication. The reason of my question is that I am new in the Open-Source world as well as to the publication world: it's very exiting! I wanted only to clarify everything before very hopefully starting. Farid: > 6)I assume that we will be using Lucene? Even though the learning > algorithms can be used for different applications (Images, Speech > recognition ...), I am more interested on Text algorithms specially since > Lucene offers Stemming, , Stop Words Filtering, Text Normalization and > even Synonym Expansion functionalities. Isabel Drost: I think it should be fine to use Lucene for the preprocessing steps and for feature extraction. It would be nice, if the algorithm was designed and implemented general enough to allow others to use it for processing images, speech or whatever they like - if that is possible and makes sense for your algorithm. Farid (NEW QUESTION) That's not an issue, all the algorithms use VSM usually. I have already implemented some learning algorithms iin the past such a way learning machine algo could be applied to any type of data (image, speech...). However, I wanted only to know if the use of LUCENE is required, suggested or neither? Regards, Farid
