I found the problem. You pass IndexSearcher to SumFacetRequest, but you
don't implement setNextReader correctly. So what happens in aggreate is
that you are given the relative docID in that segment (which in this test
it's always 0 since you commit after every addDocument()) and therefore you
read the stored value of the same absolute docID=0. If I made these changes
to the code, the prints look ok (except for the category which you add
value 0 to it, and it's not printed):
@Override
public boolean setNextReader(AtomicReaderContext context) throws
IOException {
this.context = context;
return true;
}
and
@Override
public void aggregate(int docID, float score, IntsRef ordinals)
throws IOException {
Document doc = context.reader().document(docID);
int value = doc.getField(fieldName).numericValue().intValue();
for (int i = 0; i < ordinals.length; i++) {
System.out.println("docID=" + docID + ", ord=" +
ordinals.ints[i] + ", value=" + value);
sumArray[ordinals.ints[i]] += value;
}
}
You shouldn't call commit() after every addDocument. I don't know if that's
what you do in your program, or it's just for this test, but it's
unnecessary.
And you can store more than one NumericDocValueField in a document (where
did you read that you can't?), you just have to give them different names.
Shai
On Mon, Oct 21, 2013 at 8:21 AM, Stephen GRAY <[email protected]>wrote:
> UNOFFICIAL****
>
> Hi Shai,****
>
> ** **
>
> Thanks very much for the helpful response. ****
>
> ** **
>
> No I’m not using the latest Lucene. I’m using 4.3.0 (which happens to be
> the latest in our Maven repository).****
>
> ** **
>
> I’ve attached a test case which shows the problem. I’ve tried to make this
> as simple as possible, but it’s still 240 lines (sorry). ****
>
> ** **
>
> I did consider NumericDocValuesField, but I need to get sums for more than
> one field and my understanding was that you can only store one
> NumericDocValues field per document. There’s currently not many stored
> fields in each document, but I might look at using FieldSelector to ensure
> I only read the field I’m looking for.****
>
> ** **
>
> Thanks for looking at this,****
>
> Steve****
>
> ** **
>
> ** **
>
> *From:* Shai Erera [mailto:[email protected]]
> *Sent:* Friday, 18 October 2013 8:28 PM
> *To:* [email protected]
> *Subject:* Re: Creating a SumFacetRequest class [SEC=UNOFFICIAL]****
>
> ** **
>
> Hi Stephen,****
>
> ** **
>
> The code seems correct in general (I have some comments below). The
> ordinals that you get are those that are associated with that document
> (docID). I assume this is not the newest Lucene though, right?****
>
> ** **
>
> Can you boil this down to a simple testcase adding a couple of documents
> with the value which you would like to aggregate and print the actual
> values each facet gets?****
>
> ** **
>
> About the code, I see that you read the value from a stored field. I
> recommend that you store the value in a NumericDocValuesField as it's
> loaded much faster and more efficiently than what you do. Your code
> currently reads all stored fields for the document, which is both expensive
> and inefficient.****
>
> ** **
>
> Also, if you move up to the latest Lucene (4.5.0), the API is more
> segment-oriented, so you're given all matching documents up front, and then
> you can ask for their NumericDocValues once while you iterate over them.**
> **
>
> ** **
>
> These comments are related to efficiency though. As for your original
> question, a simple testcase demonstrating the problem will help me spot the
> issue.****
>
> ** **
>
> Shai****
>
> ** **
>
> On Fri, Oct 18, 2013 at 8:57 AM, Stephen GRAY <[email protected]>
> wrote:****
>
> UNOFFICIAL****
>
> Hi everyone,****
>
> ****
>
> I need to get a sum of the values in an int field in all the documents in
> a facet. Because there is only a CountFacetRequest in Lucene I am trying to
> write a SumFacetRequest with associated Aggregator which does this. However
> the results I am getting when I use my SumFacetRequest are not correct.***
> *
>
> ****
>
> Here is the aggregate method from the Aggregator I have written (based on
> CountingAggregator):****
>
> ****
>
> @Override****
>
> *public* *void* aggregate(*int* docID, *float* score, IntsRef ordinals) *
> throws* IOException {****
>
> Document doc = searcher.doc(docID);****
>
> *int* value = doc.getField(fieldName).numericValue().intValue();****
>
> ****
>
> *for* (*int* i = 0; i < ordinals.length; i++) {****
>
> sumArray[ordinals.ints[i]] += value;****
>
> }****
>
> }****
>
> ****
>
> Would someone be able to tell me if this is correct? I have been assuming
> that ordinals.ints[i] returns an id for a facet that contains the document
> but maybe this is not correct.****
>
> ****
>
> Any help would be greatly appreciated.****
>
> ****
>
> Apologies if this is not the correct forum to post this.****
>
> ****
>
> Thanks,****
>
> Steve****
>
> ****
>
> UNOFFICIAL****
>
>
> --------------------------------------------------------------------
> Important Notice: If you have received this email by mistake, please advise
> the sender and delete the message and attachments immediately. This email,
> including attachments, may contain confidential, sensitive, legally
> privileged
> and/or copyright information. Any review, retransmission, dissemination
> or other use of this information by persons or entities other than the
> intended recipient is prohibited. DIAC respects your privacy and has
> obligations under the Privacy Act 1988. The official departmental privacy
> policy can be viewed on the department's website at www.immi.gov.au. See:
> http://www.immi.gov.au/functional/privacy.htm
>
>
> ---------------------------------------------------------------------****
>
> ** **
>
> UNOFFICIAL****
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>