Re: [More Like This] I would like to contribute

Alessandro Benedetti Thu, 31 May 2018 09:01:35 -0700

Any update on this from the community?
I would like to move forward with the contribution.
The first part is involving just the More Like This parameters, but the
rest will involve the interesting terms retrieval and quite a big piece of
testing.
No hard position on this, I was thinking that extracting the parameters was
a good idea to improve the readability ( I noticed a similar approach in
Solr for the edismax for example :
DisMaxParams, ExtendedDismaxConfiguration ... ) but if the community
prefers to keep them there, it's ok and I can move to the next steps.

I attach here my considerations in response to Robert ones :
"First of all thanks for your feedback, much appreciated.

My initial goal was to make More Like This original 1000 lines class:

   - More Readable
   - Easier to Maintain and Extend
   - More Testable

So I started identifying the different responsibilities in the More Like
This class, to separate them, in the way that if I just need to change the
scoring algorithm for the interesting terms I just touch the TermScorer ect
ect :
You see the overall idea in these slides :
https://www.slideshare.net/AlessandroBenedetti/advanced-document-similarity-with-apache-lucene

I tried to modify as less as possible the logic and tests at this stage.

So let me wrap my considerations under different topics :

*Parameters Abstraction*
I see your point for just additional indirection/abstraction ( it is
exactly just that with readability in mind).
My scope for this was :
"We have 600 lines of code of default and parameters for the MLT. How to
make them isolated, more readable and extendable ?"
And I initially thought to just put them in a separate class to remove that
responsibility from the original MLT .
So the focus was exclusively better readability and easy maintenance at
this stage.
Can you elaborate why you think this is undesired ?
I don't have any strong feeling regarding this bit, so I am open to
suggestions with the forementioned objective in mind.

*MLT Immutable*
I didn't consider it , but I am completely open to do that.
In such case it could be worth to add a MoreLikeThis factory that manages
the parameters and create the immutable MLT object ?

*MoreLikeThisQuery*
It was not in the scope of this refactor but I am absolutely happy to
tackle that immediately as a next step, it could give it a try to see if
there is space for improvement there.

*Solr Tests*

I completely agree, indeed as part of my additional tests which I have
ready for the sequent refactors I introducedmuch more tests Lucene side
than Solr side.
We can elaborate this further at the right moment "

Regards

--------------------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
www.sease.io

On Tue, May 22, 2018 at 1:32 PM, Alessandro Benedetti <[email protected]>
wrote:

> Thank you very much!
> i will update the patch name and open a Pull Request with the correct Jira
> issue key!
> Thank you again, this is definitely a start!
>
> Cheers
>
> --------------------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> www.sease.io
>
> On Tue, May 22, 2018 at 1:29 PM, Robert Muir <[email protected]> wrote:
>
>>
>>
>> On Tue, May 22, 2018 at 7:03 AM, Alessandro Benedetti <
>> [email protected]> wrote:
>>
>>> Hi Robert,
>>> thanks for the feedback.
>>> I read your comment last year and you I agreed completely.
>>> So I started step by step, with the refactor first.
>>>
>>> The first contribution is isolating a part of the refactor, so no
>>> functional change in the algorithms nor a complete refactor in place.
>>> I basically tried to decompose the refactor is unitary pull requests as
>>> small as possible.
>>> It just focused on the MLT parameters first to reduce the size of the
>>> original MoreLikeThis class ( and relegating the parameter modelling
>>> responsibility to a separate class)
>>>
>>> https://issues.apache.org/jira/browse/SOLR-12299
>>>
>>> The reason I used SOLR is because the refactor affects some Solr
>>> components using the MLT.
>>> But I agree with you, it can (should) be moved to LUCENE ( I tried via
>>> JIRA but I don't think I have the right permissions).
>>>
>>> Should I just create a new JIRA issue completely ( closing the SOLR one)
>>> or some JIRA admin can directly move the Jira to the LUCENE project ?
>>>
>>>
>> I moved it: https://issues.apache.org/jira/browse/LUCENE-8326
>>
>>
>

Re: [More Like This] I would like to contribute

Reply via email to