[ 
https://issues.apache.org/jira/browse/LUCENE-4372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13452055#comment-13452055
 ] 

Shai Erera commented on LUCENE-4372:
------------------------------------

CachingCollector has this in its javadocs:

{code}
 * Caches all docs, and optionally also scores, coming from
 * a search, and is then able to replay them to another
 * collector.  You specify the max RAM this class may use.
 * Once the collection is done, call {@link #isCached}. If
 * this returns true, you can use {@link #replay(Collector)}
 * against a new collector.  If it returns false, this means
 * too much RAM was required and you must instead re-run the
 * original search.
{code}

Notice the last sentence about isCached returning false.

Should we just fix the static create() method's documentation (even though it 
points to the class's javadocs)?

I don't see any alternative -- if the user specified a too low RAM limit, what 
can you do besides discarding the docs and documenting that behavior? I'd hate 
to see exceptions thrown...
                
> CachingCollector.create(boolean, boolean, double) is trappy
> -----------------------------------------------------------
>
>                 Key: LUCENE-4372
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4372
>             Project: Lucene - Core
>          Issue Type: Task
>            Reporter: Robert Muir
>
> Followup to LUCENE-3102.
> Shai proposed a method that just caches all scores so they can be replayed:
> {quote}
> Do you think we can modify this Collector to not necessarily wrap another 
> Collector? We have such Collector which stores (in-memory) all matching doc 
> IDs + scores (if required). Those are later fed into several processes that 
> operate on them (e.g. fetch more info from the index etc.). I am thinking, we 
> can make CachingCollector optionally wrap another Collector and then someone 
> can reuse it by setting RAM limit to unlimited (we should have a constant for 
> that) in order to simply collect all matching docs + scores.
> {quote}
> But Mike had concerns about the RAM usage:
> {quote}
> I'd actually rather not have the constant – ie, I don't want to make
> it easy to be unlimited? It seems too dangerous... I'd rather your
> code has to spell out 10*1024 so you realize you're saying 10 GB (for
> example).
> {quote}
> My concern here is what happens when you dont specify enough, I think those 
> hits are just silently dropped (which is worse than using lots of RAM).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to