Re: FacetedSearch and MultiReader

Nicola Buso Tue, 22 Jan 2013 04:52:31 -0800

I will try it.

I see there is already a lucene-4.1.0 package (dated 2013/01/21)
available for download, do you know if this version will be released
soon?



Nicola.

On Tue, 2013-01-22 at 06:20 +0200, Shai Erera wrote:
> Hi Nicola,
> 
> What I had in mind is something similar to this, which is possible starting
> with Lucene 4.1, due to changes done to facets (per-segment faceting):
> 
> DirTaxoWriter master = new DirTaxoWriter(masterDir);
> Directory[] origTaxoDirs = new Directory[numTaxoDirs]; // open Directories
> and store in that array
> OrdinalMap[] ordinalMaps = new OrdinalMap[numTaxoDirs]; // initialize
> OrdinalMap and store in that array
> 
> // now do the merge
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   master.addTaxonomy(origTaxoDir[i], ordinalMaps[i]);
> }
> 
> // now open your readers, and create the important map
> Map<AtomicReader,OrdinalMap) readerOrdinals = new
> HashMap<AtomicReader,OrdinalMap>();
> DirectoryReader[] readers = new DirectoryReader[origTaxoDirs.length];
> for (int i = 0; i < origTaxoDirs.length; i++) {
>   DirectoryReader r = DirectoryReader.open(contentDirectories[i]);
>   OrdinalMap ordMap = ordinalMaps[i];
>   for (AtomicReaderContext ctx : r.leaves()) {
>     readerOrdinals.put(ctx.reader(), ordMap);
>   }
> }
> 
> MultiReader mr = new MultiReader(readers);
> 
> // create your FacetRequest (CountFacetRequest) with a custom Aggregator
> FacetRequest fr = new CountFacetRequest(cp, topK) {
>   @Override
>   public Aggregator createAggregator(...) {
>     return new OrdinalMappingAggregator() {
>       int[] ordMap;
> 
>       @Override
>       public void setNextReader(AtomicReaderContext context) {
>         ordMap = readerOrdinals.get(context.reader()).getMap();
>       }
> 
>       @Override
>       public void aggregate(int docID, float score, IntsRef ordinals) {
>         int upto = ordinals.offset + ordinals.length;
>         for (int i = ordinals.offset; i < upto; i++) {
>           int ordinal = ordinals[i]; // original ordinal read for the
> AtomicReader given to setNextReader
>           int mappedOrdinal = ordMap[ordinal]; // mapped ordinal, following
> the taxonomy merge
>           counts[mappedOrdinal]++; // count the mapped ordinal instead, so
> all AtomicReaders count that ordinal
>         }
>       }
>     };
>   }
> }
> 
> While it may look like I wrote actual code to do it, I didn't :). So I
> guess it should work, but I haven't tried it.
> That way, you don't touch the content indexes at all, just the taxonomy
> ones.
> 
> Note however that you'll need to do this step every time the taxonomy index
> is updated, and you refresh the TaxoReader instance.
> Also, this will only work if all your indexes are opened in the same JVM
> (which I assume that's the case, since you use MultiReader).
> 
> If you still don't want to do that, then what Dennis wrote above is another
> way to do distributed faceted search, either inside the same JVM or across
> multiple JVMs.
> You obtain the FacetResult from each search and merge the results
> (unfortunately, there's still no tool in Lucene to do that for you).
> Just make sure to ask for a larger K, to ensure that the correct top-K is
> returned (see my previous notes).
> 
> Shai
> 
> 
> 
> 
> On Tue, Jan 22, 2013 at 4:32 AM, Denis Bazhenov <[email protected]> wrote:
> 
> > We have similar distribute search system and we have finished with the
> > following scheme. Search replicas (machines where index resides) are build
> > FacetResult's based on their index chunk (top N categories with document
> > counts). Later on the results are merged "by hands" with summing relevant
> > categories from different replicas.
> >
> > On Jan 22, 2013, at 3:08 AM, Nicola Buso <[email protected]> wrote:
> >
> > > Hi Shai,
> > >
> > > I was thinking to that too, but I'm indexing all indexes in a custom
> > > distributed environment than I can't in this moment have a single
> > > categories index for all the content indexes at indexing time.
> > > A solution should be to merge all the categories indexes in one only
> > > index and use your solution but the merge code I see in the examples
> > > merge also the content index and I can't do that.
> > >
> > > I should share the taxonomy if is possible to merge (I see the resulting
> > > categories indexes are not that big currently), but I would prefer to
> > > have a solution where I can collect the facets over multiple categories
> > > indexes in this way I will be sure the solution will scale better.
> > >
> > >
> > > Nicola.
> > >
> > >
> > > On Mon, 2013-01-21 at 17:54 +0200, Shai Erera wrote:
> > >> Hi Nicola,
> > >>
> > >>
> > >> I think that what you're describing corresponds to distributed faceted
> > >> search. I.e., you have N content indexes, alongside N taxonomy
> > >> indexes.
> > >>
> > >> The information that's indexed in each of those sub-indexes does not
> > >> correlate with the other ones.
> > >> For example, say that you index the category "Movie/Drama", it may
> > >> receive ordinal 12 in index1 and 23 in index2.
> > >>
> > >> If you'll try to count ordinals using MultiReader, you'll just mess up
> > >> everything.
> > >>
> > >>
> > >> If you can share a single taxonomy index for all N content indexes,
> > >> then you'll be in a super-simple position:
> > >>
> > >> 1) Open one TaxonomyReader
> > >>
> > >> 2) Execute search with MultiReader and FacetsCollector
> > >>
> > >>
> > >>
> > >> It doesn't get simpler than that ! :)
> > >>
> > >>
> > >> Before I go into great length describing what you should do if you
> > >> cannot share the taxonomy, let me know if that's not an option for
> > >> you.
> > >>
> > >> Shai
> > >>
> > >>
> > >>
> > >> On Mon, Jan 21, 2013 at 5:39 PM, Nicola Buso <[email protected]> wrote:
> > >>        Thanks for the reply Uwe,
> > >>
> > >>        we currently can search with MultiReader over all the indexes
> > >>        we have.
> > >>        Now I want to add the faceting search, than I created a
> > >>        categories index
> > >>        for every index I currently have.
> > >>        To accumulate the faceted results now I have a MultiReader
> > >>        pointing all
> > >>        the indexes and I can create a TaxonomyReader for every
> > >>        categories index
> > >>        I have; all the way I see to obtain FacetResults are:
> > >>        1 - FacetsCollector
> > >>        2 - a FacetsAccumulator implementation
> > >>
> > >>        suppose I use the second option. I should:
> > >>        - search as usual using the MultiReader
> > >>        - than try to collect all the facetresults iterating over my
> > >>        TaxonomyReaders; at every iteration:
> > >>          - I create a FacetsAccumulator using the MultiReader and a
> > >>        TaxonomyReader
> > >>          - I get a list of FacetResult from the accumulator.
> > >>        - as I finish I should in some way merge all the
> > >>        List<FacetResult> I
> > >>        have.
> > >>
> > >>        I think this solution is not correct because the docsids from
> > >>        the search
> > >>        are pointing the multireader instead the taxonomyreader is
> > >>        pointing to
> > >>        the categories index of a single reader.
> > >>        I neither like to merge all the List of FacetResult I retrieve
> > >>        from the
> > >>        Accumulators.
> > >>
> > >>        Probably I'm missing something, can somebody clarify to me how
> > >>        I should
> > >>        collect the facets in this case?
> > >>
> > >>
> > >>        Nicola.
> > >>
> > >>
> > >>
> > >>        On Mon, 2013-01-21 at 16:22 +0100, Uwe Schindler wrote:
> > >>> Just use MultiReader, it extends IndexReader, so you can
> > >>        pass it anywhere where IndexReader can be passed.
> > >>>
> > >>> -----
> > >>> Uwe Schindler
> > >>> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >>> http://www.thetaphi.de
> > >>> eMail: [email protected]
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: Nicola Buso [mailto:[email protected]]
> > >>>> Sent: Monday, January 21, 2013 3:59 PM
> > >>>> To: [email protected]
> > >>>> Subject: FacetedSearch and MultiReader
> > >>>>
> > >>>> Hi all,
> > >>>>
> > >>>> I'm trying to develop faceted search using lucene 4.0
> > >>        faceting framework.
> > >>>> In our project we are searching on multiple indexes using
> > >>        lucene
> > >>>> MultiReader. How should we use the faceted framework to
> > >>        obtain
> > >>>> FacetResults starting from a MultiReader? all the example
> > >>        I see are using a
> > >>>> "single" IndexReader.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Nicola.
> > >>>>
> > >>>>
> > >>>>
> > >>
> >  ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail:
> > >>        [email protected]
> > >>>> For additional commands, e-mail:
> > >>        [email protected]
> > >>>
> > >>
> > >>
> > >>
> > >>
> >  ---------------------------------------------------------------------
> > >>        To unsubscribe, e-mail:
> > >>        [email protected]
> > >>        For additional commands, e-mail:
> > >>        [email protected]
> > >>
> > >>
> > >>
> > >>
> > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> > ---
> > Denis Bazhenov <[email protected]>
> >
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: FacetedSearch and MultiReader

Reply via email to