If you do the same search twice in a row, the second search takes < 3 ms. Try finding your base result set and then augmenting it with a second search within the first result set.
You can sort from a function call. Sorting is multi-level, so you can make one of the levels random. Does this app have to support paging the search list? If so, do you plan to do a second search for the next 5 results? Complex results shuffling can make this hard. Also, I don't know exactly how random works, whether it generates the same random order twice. This would make paging impossible. On Mon, Aug 20, 2012 at 1:52 PM, Karthick Duraisamy Soundararaj <[email protected]> wrote: > Hi Mikhail, > You are correct. "[+] show 6 result.." will work but it > wouldn't suit my requirements. This is a question of user experience right? > > Imagine if the product manager comes to you and says I dont want to see > "[+] show 6 result.." and I want the results to be diverse but should be > showed like any other search results. > > I think grouping does this by two pass collection. First pass, it figures > out all the groups and then in the second pass, it collects the results > into these groups. > > > Thanks, > Karthick > > On Mon, Aug 20, 2012 at 3:24 PM, Mikhail Khludnev > <[email protected]> wrote: >> >> Hello, >> >> I don't believe your task can be solved by playing with scoring/collector >> or shuffling. >> For me it's absolutely Grouping usecase (despite I don't really know this >> feature well). >> >> > Grouping cannot solve the problem because I dont want to limit the >> > number of results showed based on the grouping field. >> >> I'm not really getting it. why you can set limit to 11 and just show the >> labels like "[+] show 6 result.." or if you have 11 "[+] show more than 10 >> .." >> >> If you experience problem with constructing search result page, I can >> suggest submit search request with rows=0&facet.field=BRAND, then your >> algorithm can choose number of necessary items per every brand and submit >> rows=X&fq=BRAND:Y it gives you arbitrarily sizes for "groups". >> >> Will this work for you? >> >> >> On Mon, Aug 20, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj >> <[email protected]> wrote: >>> >>> Tanguy, >>> You idea is perfect for cases where there is a too many >>> documents with 80-90% documents having same value for a particular field. As >>> an example, your idea is ideal for, lets say we have 10 documents in total >>> like this, >>> >>> doc1 : <merchantName> Kellog's </merchantName> >>> doc2 : <merchantName> Kellog's </merchantName> >>> doc3 : <merchantName> Kellog's </merchantName> >>> doc4 : <merchantName> Kellog's </merchantName> >>> doc5 : <merchantName> Kellog's </merchantName> >>> doc6 : <merchantName> Kellog's </merchantName> >>> doc7 : <merchantName> Kellog's </merchantName> >>> doc8 : <merchantName> Nestle </merchantName> >>> doc9 : <merchantName> Kellog's </merchantName> >>> doc10 : <merchantName> Kellog's </merchantName> >>> >>> But I have >>> doc1 : <merchantName> Maggi </merchantName> >>> doc2 : <merchantName> Maggi </merchantName> >>> doc3 : <merchantName> M&M's </merchantName> >>> doc4 : <merchantName> M&M's </merchantName> >>> doc5 : <merchantName> Hershey's </merchantName> >>> doc6 : <merchantName> Hershey's </merchantName> >>> doc7 : <merchantName> Nestle </merchantName> >>> doc8 : <merchantName> Nestle </merchantName> >>> doc9 : <merchantName> Kellog's </merchantName> >>> doc10 : <merchantName> Kellog's </merchantName> >>> >>> >>> Thanks, >>> Karthick >>> >>> On Mon, Aug 20, 2012 at 12:01 PM, Tanguy Moal <[email protected]> >>> wrote: >>>> >>>> Hello, >>>> >>>> I don't know if that could help, but if I understood your issue, you >>>> have a lot of documents with the same or very close scores. Moreover I >>>> think >>>> you get your matches in Merchant order (more or less) because they must be >>>> indexed in that very same order, so solr returns documents of same scores >>>> in >>>> insertion order (although there is no contract specifying this) >>>> >>>> You could work around that issue by : >>>> 1/ Turning off tf/idf because you're searching in documents with little >>>> text where only the match counts, but frequencies obviously aren't helping. >>>> 2/ Add a random number to each document at index time, and boost on that >>>> random value at query time, this will shuffle your results, that's probably >>>> the simplest thing to do. >>>> >>>> Hope this helps, >>>> >>>> Tanguy >>>> >>>> 2012/8/20 Karthick Duraisamy Soundararaj <[email protected]> >>>>> >>>>> Hello Mikhail, >>>>> Thank you for the reply. In terms of user >>>>> experience, I want to spread out the products from same brand farther from >>>>> each other, atleast in the first 50-100 results we display. I am thinking >>>>> about two different approaches as solution. >>>>> >>>>> 1. For first few results, display one top scoring >>>>> product of a manufacturer (For a given field, display the top scoring >>>>> results of the unique field values for the first N matches) . This N could >>>>> be either a percentage relative to total matches or a configurable >>>>> absolute >>>>> value. >>>>> 2. Enforce a penalty on the score for the >>>>> results that have duplicate field values. The penalty can be enforced >>>>> such a >>>>> way that, the results with higher scores will not be affected as against >>>>> the >>>>> ones with lower score. >>>>> >>>>> Both of the solutions can be implemented while sorting the documents >>>>> with TopFieldCollector / TopScoreDocCollector. >>>>> >>>>> Does this answer your question? Please let me know if you have any >>>>> more questions. >>>>> >>>>> Thanks, >>>>> Karthick >>>>> >>>>> On Mon, Aug 20, 2012 at 3:26 AM, Mikhail Khludnev >>>>> <[email protected]> wrote: >>>>>> >>>>>> Hello, >>>>>> >>>>>> I've got the problem description below. Can you explain the expected >>>>>> user experience, and/or solution approach before diving into the >>>>>> algorithm >>>>>> design? >>>>>> >>>>>> Thanks >>>>>> >>>>>> >>>>>> On Sat, Aug 18, 2012 at 2:50 AM, Karthick Duraisamy Soundararaj >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> My problem is that when there are a lot of documents representing >>>>>>> products, >>>>>>> products from same manufacturer seem to appear in close proximity in >>>>>>> the >>>>>>> results and therefore, it doesnt provide brand diversity. When you >>>>>>> search >>>>>>> for sofas, you get sofas from a manufacturer A dominating the first >>>>>>> page >>>>>>> while the sofas from manufacturer B dominating the second page, etc. >>>>>>> The >>>>>>> issue here is that a manufacturer tends to describes the different >>>>>>> sofas he >>>>>>> produces the same way and therefore there is a very little difference >>>>>>> between the documents representing two sofas. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Sincerely yours >>>>>> Mikhail Khludnev >>>>>> Tech Lead >>>>>> Grid Dynamics >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> >> >> >> -- >> Sincerely yours >> Mikhail Khludnev >> Tech Lead >> Grid Dynamics >> >> >> > > -- Lance Norskog [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
