UNOFFICIAL
Hi Shai,

Thanks very much for the helpful response.

No I'm not using the latest Lucene. I'm using 4.3.0 (which happens to be the 
latest in our Maven repository).

I've attached a test case which shows the problem. I've tried to make this as 
simple as possible, but it's still 240 lines (sorry).

I did consider NumericDocValuesField, but I need to get sums for more than one 
field and my understanding was that you can only store one NumericDocValues 
field per document. There's currently not many stored fields in each document, 
but I might look at using FieldSelector to ensure I only read the field I'm 
looking for.

Thanks for looking at this,
Steve


From: Shai Erera [mailto:[email protected]]
Sent: Friday, 18 October 2013 8:28 PM
To: [email protected]
Subject: Re: Creating a SumFacetRequest class [SEC=UNOFFICIAL]

Hi Stephen,

The code seems correct in general (I have some comments below). The ordinals 
that you get are those that are associated with that document (docID). I assume 
this is not the newest Lucene though, right?

Can you boil this down to a simple testcase adding a couple of documents with 
the value which you would like to aggregate and print the actual values each 
facet gets?

About the code, I see that you read the value from a stored field. I recommend 
that you store the value in a NumericDocValuesField as it's loaded much faster 
and more efficiently than what you do. Your code currently reads all stored 
fields for the document, which is both expensive and inefficient.

Also, if you move up to the latest Lucene (4.5.0), the API is more 
segment-oriented, so you're given all matching documents up front, and then you 
can ask for their NumericDocValues once while you iterate over them.

These comments are related to efficiency though. As for your original question, 
a simple testcase demonstrating the problem will help me spot the issue.

Shai

On Fri, Oct 18, 2013 at 8:57 AM, Stephen GRAY 
<[email protected]<mailto:[email protected]>> wrote:

UNOFFICIAL
Hi everyone,

I need to get a sum of the values in an int field in all the documents in a 
facet. Because there is only a CountFacetRequest in Lucene I am trying to write 
a SumFacetRequest with associated Aggregator which does this. However the 
results I am getting when I use my SumFacetRequest are not correct.

Here is the aggregate method from the Aggregator I have written (based on 
CountingAggregator):

@Override
public void aggregate(int docID, float score, IntsRef ordinals) throws 
IOException {
  Document doc = searcher.doc(docID);
  int value = doc.getField(fieldName).numericValue().intValue();

  for (int i = 0; i < ordinals.length; i++) {
    sumArray[ordinals.ints[i]] += value;
  }
}

Would someone be able to tell me if this is correct? I have been assuming that 
ordinals.ints[i] returns an id for a facet that contains the document but maybe 
this is not correct.

Any help would be greatly appreciated.

Apologies if this is not the correct forum to post this.

Thanks,
Steve


UNOFFICIAL

--------------------------------------------------------------------
Important Notice: If you have received this email by mistake, please advise
the sender and delete the message and attachments immediately. This email,
including attachments, may contain confidential, sensitive, legally privileged
and/or copyright information. Any review, retransmission, dissemination
or other use of this information by persons or entities other than the
intended recipient is prohibited. DIAC respects your privacy and has
obligations under the Privacy Act 1988. The official departmental privacy
policy can be viewed on the department's website at 
www.immi.gov.au<http://www.immi.gov.au>. See:
http://www.immi.gov.au/functional/privacy.htm


---------------------------------------------------------------------


UNOFFICIAL

Attachment: SumFacetsTestCase.java
Description: SumFacetsTestCase.java

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to