Any update on this from the community? I would like to move forward with the contribution. The first part is involving just the More Like This parameters, but the rest will involve the interesting terms retrieval and quite a big piece of testing. No hard position on this, I was thinking that extracting the parameters was a good idea to improve the readability ( I noticed a similar approach in Solr for the edismax for example : DisMaxParams, ExtendedDismaxConfiguration ... ) but if the community prefers to keep them there, it's ok and I can move to the next steps.
I attach here my considerations in response to Robert ones : "First of all thanks for your feedback, much appreciated. My initial goal was to make More Like This original 1000 lines class: - More Readable - Easier to Maintain and Extend - More Testable So I started identifying the different responsibilities in the More Like This class, to separate them, in the way that if I just need to change the scoring algorithm for the interesting terms I just touch the TermScorer ect ect : You see the overall idea in these slides : https://www.slideshare.net/AlessandroBenedetti/advanced-document-similarity-with-apache-lucene I tried to modify as less as possible the logic and tests at this stage. So let me wrap my considerations under different topics : *Parameters Abstraction* I see your point for just additional indirection/abstraction ( it is exactly just that with readability in mind). My scope for this was : "We have 600 lines of code of default and parameters for the MLT. How to make them isolated, more readable and extendable ?" And I initially thought to just put them in a separate class to remove that responsibility from the original MLT . So the focus was exclusively better readability and easy maintenance at this stage. Can you elaborate why you think this is undesired ? I don't have any strong feeling regarding this bit, so I am open to suggestions with the forementioned objective in mind. *MLT Immutable* I didn't consider it , but I am completely open to do that. In such case it could be worth to add a MoreLikeThis factory that manages the parameters and create the immutable MLT object ? *MoreLikeThisQuery* It was not in the scope of this refactor but I am absolutely happy to tackle that immediately as a next step, it could give it a try to see if there is space for improvement there. *Solr Tests* I completely agree, indeed as part of my additional tests which I have ready for the sequent refactors I introducedmuch more tests Lucene side than Solr side. We can elaborate this further at the right moment " Regards -------------------------- Alessandro Benedetti Search Consultant, R&D Software Engineer, Director www.sease.io On Tue, May 22, 2018 at 1:32 PM, Alessandro Benedetti <[email protected]> wrote: > Thank you very much! > i will update the patch name and open a Pull Request with the correct Jira > issue key! > Thank you again, this is definitely a start! > > Cheers > > -------------------------- > Alessandro Benedetti > Search Consultant, R&D Software Engineer, Director > www.sease.io > > On Tue, May 22, 2018 at 1:29 PM, Robert Muir <[email protected]> wrote: > >> >> >> On Tue, May 22, 2018 at 7:03 AM, Alessandro Benedetti < >> [email protected]> wrote: >> >>> Hi Robert, >>> thanks for the feedback. >>> I read your comment last year and you I agreed completely. >>> So I started step by step, with the refactor first. >>> >>> The first contribution is isolating a part of the refactor, so no >>> functional change in the algorithms nor a complete refactor in place. >>> I basically tried to decompose the refactor is unitary pull requests as >>> small as possible. >>> It just focused on the MLT parameters first to reduce the size of the >>> original MoreLikeThis class ( and relegating the parameter modelling >>> responsibility to a separate class) >>> >>> https://issues.apache.org/jira/browse/SOLR-12299 >>> >>> The reason I used SOLR is because the refactor affects some Solr >>> components using the MLT. >>> But I agree with you, it can (should) be moved to LUCENE ( I tried via >>> JIRA but I don't think I have the right permissions). >>> >>> Should I just create a new JIRA issue completely ( closing the SOLR one) >>> or some JIRA admin can directly move the Jira to the LUCENE project ? >>> >>> >> I moved it: https://issues.apache.org/jira/browse/LUCENE-8326 >> >> >
