Hi Robert, thanks for the feedback. I read your comment last year and you I agreed completely. So I started step by step, with the refactor first.
The first contribution is isolating a part of the refactor, so no functional change in the algorithms nor a complete refactor in place. I basically tried to decompose the refactor is unitary pull requests as small as possible. It just focused on the MLT parameters first to reduce the size of the original MoreLikeThis class ( and relegating the parameter modelling responsibility to a separate class) https://issues.apache.org/jira/browse/SOLR-12299 The reason I used SOLR is because the refactor affects some Solr components using the MLT. But I agree with you, it can (should) be moved to LUCENE ( I tried via JIRA but I don't think I have the right permissions). Should I just create a new JIRA issue completely ( closing the SOLR one) or some JIRA admin can directly move the Jira to the LUCENE project ? Thank you again for your support, Regards -------------------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director www.sease.io On Tue, May 22, 2018 at 10:18 AM, Robert Muir <[email protected]> wrote: > For proposed api, behavior changes or refactoring to these classes, I > really recommend using LUCENE issues for those instead of SOLR ones. > Otherwise they can get missed. > > As far as feedback, personally I tried to give it on LUCENE-7498 a year > ago but wasn't sure what happened as further comments dropped off. As I > mentioned there, I definitely think changing the algorithm to MoreLikeThis > is a big deal and really shouldn't be mixed in with refactorings or api > changes: it makes for too much to worry about at once. Just changing the > algorithm is a big deal: since this class supports blind relevance feedback > it means we can do some rough measurements with relevance tests before > doing that. As I have personally not seen the BM25 algorithm used for these > purposes anywhere, that's why I was concerned/curious about performance. > > On Mon, May 21, 2018 at 7:23 AM, Alessandro Benedetti < > [email protected]> wrote: > >> Hi gents, >> I have spent some time in the last year or so working on the Lucene More >> Like This ( and related Solr components ) . >> >> Initially I just wanted to improve it, adding BM25[1] but then I noted a >> lot of areas of possible improvements. >> >> I started then with a refactor of the functionality with these objectives >> in mind : >> >> 1) make the MLT more readable >> 2) make the MLT more modular and easy to extend >> 3) make the MLT more tested >> >> *This is just a start, I want to invest significant time with my company >> to work on the functionality and contribute it back.* >> >> I split my effort in small Pull Requests to make it easy a review and >> possible contribution. >> >> Unfortunately I didn't get much feedback so far. >> The More Like This functionality seems mostly abandoned. >> I tried also to contact one of the last committers that apparently got >> involved in the developments ( Mark Harwood [email protected] ), but I >> had no luck. >> >> This is the current Jira Issue that start with a first small refactor + >> tests : >> >> https://issues.apache.org/jira/browse/SOLR-12299 >> >> I would love to contribute it and much more, but I need some feedback and >> review ( unfortunately I am not a committer yet). >> >> Let me know what can I do to speed up the process from my side. >> >> Regards >> >> [1] https://issues.apache.org/jira/browse/LUCENE-7498 >> >> -------------------------- >> Alessandro Benedetti >> Search Consultant, R&D Software Engineer, Director >> www.sease.io >> > >
