Re: SOLR ranking

Ere Maijala Fri, 19 Feb 2016 01:25:13 -0800

If he needs faceting or something (I didn't see that specified), doingtwo queries won't do, of course..


--Ere


19.2.2016, 2.22, Binoy Dalal kirjoitti:

Hi Alessandro,
Don't get me wrong. Using mm, ps and pf can and absolutely will solve his
problem.

Like I said above, my solution is meant to be a quick and dirty fix. It's
really not that complex and shouldn't take more than an hour to setup at
the app level. Moreover I suggested it because he said it was urgent for
him and setting up a proper config with mm, pf and ps might take him much
longer.

Hope this clears things up :)

On Fri, 19 Feb 2016, 05:31 Alessandro Benedetti <abenede...@apache.org>
wrote:

Hey Binoi ,
can't understand why such complexity to be honest :/
Can you explain me why playing with :

edismax
mm ( percentage of query terms you want to be in the results)
pf ( the fields you want to be boosted if phrase matches )
ps ( slop to allow)

Should not solve the problem instead of the 2 phases query ?

Cheers

On 18 February 2016 at 18:09, Binoy Dalal <binoydala...@gmail.com> wrote:

Here's an alternative solution that may be of some help.
Here I'm assuming that you are not directly outputting the search results
to the user and have some sort of layer between the results from solr and
presentation to the user where some additional processing can be

performed.


1) You already know that you want phrase matches to show up higher than
single matches. In this case, why not do an explicit phrase match first,
with some slop or as is based on how close you want the phrase terms be

to

each other.
2) Once you have the results from the first query, fire an OR query with
your terms and get those results.
3) Put results from (2) after (1) and present to the user. This happens

in

the app layer.

This is essentially the same as running a query as such: "Rheumatoid
Arthritis"~slop OR (Rhuematoid AND Arthritis) but you don't need to worry
about the ordering because you're sorting your results.

Now, this will obviously take more time since you're querying twice and
then doing the addtional processing in the app layer, but provided your
architecture is balanced enough and can cope with a little extra load, I

do

not think that your performance will take that bad a hit. Moreover since
you're in a hurry, you could implement this as a quick and dirty solution
to meet the project goals, provided it fits the acceptance parameters and
then later play around with the scoring/sorting and figure out the best
possible setup to suit your needs.

On Thu, Feb 18, 2016 at 4:22 PM Emir Arnautovic <
emir.arnauto...@sematext.com> wrote:

Hi Nitin,
Can you send us how your parsed query looks like (from debug output).

Thanks,
Emir

On 17.02.2016 08:38, Nitin.K wrote:

Hi Binoy,

We are searching for both phrases and individual words
but we want that only those documents which are having phrases will

come

first in the order and then the individual app.

termPositions = true is also not working in my case.

I have also removed the string type from copy fields. kindly look

into

the

changed configuration below:

Hi Emir,

I have changed the cofiguration as per your suggestion, added pf2 /

pf3.

Yes, i saw the difference but still the ranking is not getting

followed

correctly in case of phrases.

Changed configuration;

<field name="topic_title" type="text_general" indexed="true"

stored="true"

/>
<field name="topTitle" type="text_phrase" indexed="true"

stored="false"

/>


<field name="subtopic_title" type="text_general" indexed="true"
stored="true"/>
<field name="subTopTitle" type="text_phrase" indexed="true"

stored="false"/>


<field name="index_term" type="text_ws" indexed="true" stored="true"
multiValued="true"/>
<field name="indTerm" type="text_phrase" indexed="true"

stored="false"

multiValued="true"/>

<field name="drug" type="text_ws" indexed="true" stored="true"
multiValued="true"/>
<field name="drugString" type="text_phrase" indexed="true"

stored="false"

multiValued="true"/>

<field name="tglData" type="text_phrase" indexed="true"

stored="false"/>


Copy fields again for the reference :

<copyField source="topic_title" dest="topTitle"/>
<copyField source="subtopic_title" dest="subTopTitle"/>
<copyField source="index_term" dest="indTerm"/>
<copyField source="drug" dest="drugString"/>
<copyField source="content" dest="tglData"/>

Added following field type:

<fieldType name="text_phrase" class="solr.TextField"
positionIncrementGap="100" omitNorms="true">
       <analyzer>
               <tokenizer class="solr.WhitespaceTokenizerFactory"/>
               <filter class="solr.StopFilterFactory"

ignoreCase="true"

words="stopwords.txt" />
               <filter class="solr.LowerCaseFilterFactory"/>
       </analyzer>
</fieldType>

Removed the string type from the copy fields.

Changed Query :

http://localhost:8983/solr/tgl/select?q=rheumatoid%20arthritis&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&;

pf=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf2=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
pf3=topTitle^200 subtopTitle^80 indTerm^40 drugString^30 tglData^6&
qf=topic_title^100 subtopic_title^40 index_term^20 drug^15 content^3

After making these changes, I am able to get my search results

correctly

for

a single term but in case of phrase search, i am still not able to

get

the

results in the correct order.

Hi Modassar,

I tried using mm=100, but the order is still the same.

Hi Alessandro,

I have not yet tried the slope parameter. By default it is taking it

as

1.0

when i looked it in debug mode. Will revert you definitely. So, let

me

try

this option too.

All,

Please suggest if anyone is having any other suggestion on this. I

have

to

implement it on urgent basis and i think i am very close to it.

Thanks

all

of you. I have reached to this level just because of you guys.

Thanks and Regards,
Nitin



--
View this message in context:

http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257782.html

Sent from the Solr - User mailing list archive at Nabble.com.


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/

--

Regards,
Binoy Dalal




--
--------------------------

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


--
Ere Maijala
Kansalliskirjasto / The National Library of Finland

Re: SOLR ranking

Reply via email to