[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113611#comment-13113611
 ] 

Christopher Ball commented on LUCENE-3435:
------------------------------------------

Grant - Great start =)

Below is some initial feedback (happy to help further if you want to chat in 
real-time) 

*Quickly Groking* - To make it easier to quickly comprehend, the cells that are 
to be updated in the spreadsheet should be color coded (as opposed to those 
that are calculated)  

*Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and 
documentCache as 512 which implies the size is based on bytes when in fact the 
units of the cache are entries - I would clarify this in the spreadsheet as I 
have seen numerous blogs and emails confuse this.

*Approach to Cache Sizing* - Given memory requirements are heavily contingent 
on caching I would suggest including at least one approach for how to determine 
cache size

* Query Result Cache
** Estimation: should be greater than 'number of commonly reoccurring unique 
queries' x 'number of sort parameters' x 'number of possible sort orders' 
* Document Cache
** Estimation: should be greater than 'maximum number of documents per query' x 
'maximum number of concurrent queries'
* Filter Cache
** Estimation: should be number of unique filter queries (should clarify what 
constitutes 'unique')
* Field Value Cache
** Estimation: should be ?
* Custom Caches
** Estimation: should be ? - A common use case?

*Faceting* - Surprised there is no reference to use of faceting which is 
increasingly common default query functionality would further increase memory 
requirements for effective use

*Obscure Metrics* - To really give this spreadsheet some teeth, there really 
should be pointers for at least one approach on how to estimate each input 
metric (could be on another tab). 

* Some are fairly easy: 
** Number of Unique Terms / field
** Number of documents
** Number of indexed fields (no norms)
** Number of fields w/ norms
** Number of non-String Sort Fields other than score
** Number of String Sort Fields
** Number of deleted docs on avg
** Avg. number of terms per query

* Some are quite obscure (and guidance on how to estimate is essential):
** Numberof RAM-based Column Stride Fields (DocValues)
** ramBufferSizeMB
** Transient Factor (MB)
** fieldValueCache Max Size
** Custom Cache Size (MB)
** Avg. Number of Bytes per Term
** Bytes/Term
** Field Cache bits/term
** Cache Key Avg. Size (Bytes)
** Avg QueryResultKey size (in bytes)

> Create a Size Estimator model for Lucene and Solr
> -------------------------------------------------
>
>                 Key: LUCENE-3435
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3435
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/other
>    Affects Versions: 4.0
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to