Re: lucene 4.2 count on merged taxonomies

Shai Erera Thu, 11 Apr 2013 03:24:55 -0700

Hi Nicola,

I didn't read the code examples, but I'll relate to your last question
regarding the Aggregator. Indeed, with Lucene 4.2,
FacetRequest.createAggregator is not called by the default
FacetsAccumulator. This method should go away from FacetRequest entirely,
but unfortunately we did not finish all the refactoring work before 4.2.


What you should do is extend the new FacetsAggregator and override
FacetsAccumulator.getAggregator(). Can you try that and let us know if that
resolves your problem?

Shai


On Thu, Apr 11, 2013 at 1:05 PM, Nicola Buso <[email protected]> wrote:

> Hi all,
>
> in Lucene 4.1, after some advise from the mailing list I am merging
> taxonomies (in memory because of the small size of taxonomies indexes)
> and collecting facets values from the merged taxonomy instead of the
> single ones; the scenario is:
> - you have a Multireader pointing to more indexes
> - you are querying the Multireader
> - you want to collect facets for the Multireader
>
> What I'm doing:
> -1- taxonomies merging
> long createStart = System.currentTimeMillis();
> catMergeDir = new RAMDirectory();
> readerOrdinalsMap = new HashMap<AtomicReader,
> DirectoryTaxonomyWriter.OrdinalMap>();
> DirectoryTaxonomyWriter taxoMergeWriter = new
> DirectoryTaxonomyWriter(catMergeDir);
> Directory taxoDirectory = null;
> IndexReader contentReader = null;
> OrdinalMap[] ordinalMapsArray = new
> DirectoryTaxonomyWriter.MemoryOrdinalMap[taxoIdxRepoArray.length];
>
> for (int idx = 0; idx < taxoIdxRepoArray.length; idx++) {
>     taxoDirectory =
> LuceneDirectoryFactory.getDirectory(taxoIdxRepoArray[idx]);
>     contentReader = idxReaderArray[idx];
>     ordinalMapsArray[idx] = new
> DirectoryTaxonomyWriter.MemoryOrdinalMap();
>     taxoMergeWriter.addTaxonomy(taxoDirectory, ordinalMapsArray[idx]);
>
>     for (AtomicReaderContext readerCtx : contentReader.leaves()) {
>         readerOrdinalsMap.put(readerCtx.reader(),
> ordinalMapsArray[idx]);
>     }
> }
> taxoMergeWriter.close();
> log.info(String.format("Taxonomy merge time elapsed: %s(ms)",
> System.currentTimeMillis() - createStart));
>
> ------
> from the code above I'm holding:
> - catMergeDir: the directory containing the merged categories
> - readerOrdinalsMap: map containing the ordinals for every reader in the
> Multireader
>
> -2- aggregator based on the ordinalsMap constructed in -1-
> class OrdinalMappingCountingAggregator extends CountingAggregator {
>     private int[] ordinalMap;
>
>     public OrdinalMappingCountingAggregator(int[] counterArray) {
>         super(counterArray);
>     }
>
>     @Override
>     public void aggregate(int docID, float score, IntsRef ordinals)
>         throws IOException {
>
>         int upto = ordinals.offset + ordinals.length;
>         for (int i = ordinals.offset; i < upto; i++) {
>         int ordinal = ordinals.ints[i]; // original ordinal read for the
> AtomicReader given to setNextReader
>         int mappedOrdinal = ordinalMap[ordinal]; // mapped ordinal,
> following the taxonomy merge
>         counterArray[mappedOrdinal]++; // count the mapped ordinal
> instead, so all AtomicReaders count that ordinal
>         }
>     }
>
>     @Override
>     public boolean setNextReader(AtomicReaderContext ctx)
>         throws IOException {
>
>         if (readerOrdinalsMap.get(ctx.reader()) == null) { return
> false; }
>         ordinalMap = readerOrdinalsMap.get(ctx.reader()).getMap();
>         return true;
>     }
> }
>
> -3- override the CountFacetRequest.createAggregator(..) to return -2-
> return new CountFacetRequest(cp, maxCount) {
>
>     @Override
>     public Aggregator createAggregator(boolean useComplements,
>         FacetArrays arrays, TaxonomyReader taxonomy) {
>
>         int[] a = arrays.getIntArray();
>
>         return new OrdinalMappingCountingAggregator(a);
>     }
> };
> --------
> In 4.2 is no more working, and I'm not collecting facet values from the
> merged taxonomy.
>
> First problem I realized is:
> the new api FacetCollector.     create(FacetSearchParams fsp, IndexReader
> indexReader, TaxonomyReader taxoReader) will give back collectors and
> accumulators that will never call FacetRequest.createAggregator()
> You have to use the api  FacetsCollector.create(FacetsAccumulator
> accumulator) passing to it a StandarFacetsAccumulator (the only one that
> will call FacetRequest.createAggregator(..)
>
> Second
> Also using the StandardFacetsAccumulator it's not working because the
> facet counting is wrong.
> Any advice why this is happening?
>
> I'm also going to check how to use this idea to mimic the behaviour of
> the FastCountingFacetsAggregator, that I think should be the right way.
>
> I hope I gived enough information, if somebody can help better
> understanding how facets changed in 4.2 will be appreciated.
>
>
>
> Nicola.
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: lucene 4.2 count on merged taxonomies

Reply via email to