[
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13113611#comment-13113611
]
Christopher Ball commented on LUCENE-3435:
------------------------------------------
Grant - Great start =)
Below is some initial feedback (happy to help further if you want to chat in
real-time)
*Quickly Groking* - To make it easier to quickly comprehend, the cells that are
to be updated in the spreadsheet should be color coded (as opposed to those
that are calculated)
*Bytes or Entries* - You list Max Size for filterCache, queryResultCache, and
documentCache as 512 which implies the size is based on bytes when in fact the
units of the cache are entries - I would clarify this in the spreadsheet as I
have seen numerous blogs and emails confuse this.
*Approach to Cache Sizing* - Given memory requirements are heavily contingent
on caching I would suggest including at least one approach for how to determine
cache size
* Query Result Cache
** Estimation: should be greater than 'number of commonly reoccurring unique
queries' x 'number of sort parameters' x 'number of possible sort orders'
* Document Cache
** Estimation: should be greater than 'maximum number of documents per query' x
'maximum number of concurrent queries'
* Filter Cache
** Estimation: should be number of unique filter queries (should clarify what
constitutes 'unique')
* Field Value Cache
** Estimation: should be ?
* Custom Caches
** Estimation: should be ? - A common use case?
*Faceting* - Surprised there is no reference to use of faceting which is
increasingly common default query functionality would further increase memory
requirements for effective use
*Obscure Metrics* - To really give this spreadsheet some teeth, there really
should be pointers for at least one approach on how to estimate each input
metric (could be on another tab).
* Some are fairly easy:
** Number of Unique Terms / field
** Number of documents
** Number of indexed fields (no norms)
** Number of fields w/ norms
** Number of non-String Sort Fields other than score
** Number of String Sort Fields
** Number of deleted docs on avg
** Avg. number of terms per query
* Some are quite obscure (and guidance on how to estimate is essential):
** Numberof RAM-based Column Stride Fields (DocValues)
** ramBufferSizeMB
** Transient Factor (MB)
** fieldValueCache Max Size
** Custom Cache Size (MB)
** Avg. Number of Bytes per Term
** Bytes/Term
** Field Cache bits/term
** Cache Key Avg. Size (Bytes)
** Avg QueryResultKey size (in bytes)
> Create a Size Estimator model for Lucene and Solr
> -------------------------------------------------
>
> Key: LUCENE-3435
> URL: https://issues.apache.org/jira/browse/LUCENE-3435
> Project: Lucene - Java
> Issue Type: Task
> Components: core/other
> Affects Versions: 4.0
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space
> that both Lucene and Solr use, given certain assumptions. I intend to check
> in an Excel spreadsheet that allows people to estimate memory and disk usage
> for trunk. I propose to put it under dev-tools, as I don't think it should
> be official documentation just yet and like the IDE stuff, we'll see how well
> it gets maintained.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]