Actually DocTermsIndex entry can take quite some memory. I believe in
the case when you have a lot of unique strings more memory is used for
DocTermsIndex then if you have a small number of unique fieldvalues
with many documents per value.

I do think that an option that decides whether a double cache entry is
added to FC is desirable. The default should be false and if users
want fast grouping for non string fields then they set this option to
true. I think group.method is a bit vague and it isn't descriptive
about what exactly is is doing. It should be an expert option.
Maybe something like group.moreRamFasterGroupingNonStringFields=[true|false]

Having the BlockGroupingCollector in Solr would be great. However the
collector depends on block indexing and this is something that Solr
currently doesn't support. So that needs to be implemented first. I
think for using the BlockGroupingCollector we would just need two
parameters one that tells Solr to actually use the
BlockGroupingCollector and one parameter that tell Solr how to query
for the parent documents. Maybe be something like:
group.block=[true|false] and group.parent.query=[query]

Martijn

On 30 November 2011 00:30, Young, Cody <[email protected]> wrote:
> Hi Martijn,
>
> Thanks for the response!
>
> Doesn't it take a lot more memory to hold a string field in the FieldCache 
> than a long field?
>
> In our grouping scenario, we have many unique values with a small number of 
> documents per group. I would think that even the double FieldCache memory hit 
> on a long would be less than using a string.
>
> Would this is a suitable place to have a grouping parameter to control the 
> behavior? group.method? I'm looking at using the BlockGroupingCollector as 
> well, perhaps "block" could be another choice?
> The downside being that there are invalid combinations. (You wouldn’t change 
> group.method to anything else if you were using a function to group)
>
> Thanks,
> Cody
>
> -----Original Message-----
> From: [email protected] [mailto:[email protected]] On Behalf 
> Of Martijn v Groningen
> Sent: Tuesday, November 29, 2011 2:09 PM
> To: [email protected]
> Subject: Re: Grouping on Long type uses function query?
>
> If I remember correctly this was done to avoid insane FieldCache usage.
>
> If Term based grouping implementation is used then for that field an entry is 
> created in the FieldCache of type DocTermsIndex. It might then happen that 
> for other search features like sorting and faceting a second entry is created 
> in the FieldCache. Sorting for example will put in your case a new entry for 
> this field in the FieldCache of type long. When the Function based grouping 
> implementations are used this is not the case. Only one cache entry of type 
> long is put in the FieldCache and sorting or faceting will reuse these 
> entries.
>
> The downside of the Function based grouping implementations is that they are 
> slower then the Term based implementation.
> At the time this feature was integrated into Solr the decision was made to 
> not have double FieldCache usage per field and use the slower Function based 
> implementation for non string fields.
>
> The work around that doesn't involve coding is the make a copy field of type 
> string, but then you add more fields / data to your index...
>
> On 29 November 2011 22:25, Young, Cody <[email protected]> wrote:
>> Hi All,
>>
>>
>>
>> I’m new to solr development. Since I’m new with the code base, I
>> thought I’d double check here before making a JIRA issue. We’re trying
>> to use grouping on a field with a type of long (on trunk):
>>
>>     <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
>> omitNorms="true" positionIncrementGap="0"/>
>>
>>
>>
>> The performance wasn’t what we were looking for so I’m taking a quick
>> look at the grouping code in solr and I noticed that a string field
>> uses the Term grouping classes (CommandField in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java).
>> However, when using a long field the Function grouping classes get
>> used (CommandFunc in
>> /trunk/solr/core/src/java/org/apache/solr/search/Grouping.java). When
>> I change it over to using CommandField instead of CommandFunc for long
>> type I get a decrease in QTime (I only did light testing, and just simple 
>> queries but it seemed to drop by 50% or so).
>>
>>
>>
>> The functionality appears to still work and the grouping tests pass,
>> but as I’m not very familiar with the solr code I wasn’t sure if there
>> was a reason for Long to use CommandFunc instead of CommandField.
>>
>>
>>
>> I’m happy to take a stab at making a JIRA issue and a patch if this is
>> indeed an issue, but I’ll need some guidance on the best way to fix
>> this (perhaps instead of using instanceof StrFieldSource or instanceof
>> LongFieldSource there is a better way to check?).
>>
>>
>>
>> The change I made to test this was very simple, I just added:
>>
>>
>>
>> import org.apache.lucene.queries.function.valuesource.LongFieldSource;
>>
>>
>>
>> and at Line 176 of Grouping.java
>>
>>      } else if(valueSource instanceof LongFieldSource) {
>>
>>          String field = ((LongFieldSource) valueSource).getField();
>>
>>          CommandField commandField = new CommandField();
>>
>>          commandField.groupBy = field;
>>
>>          gc = commandField;
>>
>>
>>
>> Thanks,
>>
>> Cody
>
>
>
> --
> Met vriendelijke groet,
>
> Martijn van Groningen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional 
> commands, e-mail: [email protected]
>



-- 
Met vriendelijke groet,

Martijn van Groningen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to