[ 
https://issues.apache.org/jira/browse/LUCENE-3435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104622#comment-13104622
 ] 

Grant Ingersoll commented on LUCENE-3435:
-----------------------------------------

A good deal of it Mike and I worked out yesterday on IRC (well, mostly Mike 
explained and I took copious notes).  The disk storage stuff is based on LIA2.  
It is a theoretical model and not an empirical one other than the bytes/term 
calculation was based off of indexing wikipedia.  

I would deem it a gross approximation of the state of trunk at this point in 
time.  My gut says the Lucene estimation is a little low, while Solr is fairly 
close (since I suspect Solr's memory usage is dominated by caching).  I imagine 
there are things still unaccounted for. For instance, I haven't reverse 
engineered the fieldValueCache memSize() method yet and I don't have a good 
sense of how much memory would be consumed in a highly concurrent system by the 
sheer number of Query objects instantiated or when one has really large Queries 
(say 5K terms).  It also is not meant to be one size fits all.  Lucene/Solr 
have a ton of tuning options that could change things significantly.

I did a few sanity checks against things I've seen in the past, and thought it 
was reasonable.  There is, of course, no substitute for good testing.  In other 
words, caveat emptor.

> Create a Size Estimator model for Lucene and Solr
> -------------------------------------------------
>
>                 Key: LUCENE-3435
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3435
>             Project: Lucene - Java
>          Issue Type: Task
>          Components: core/other
>    Affects Versions: 4.0
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>
> It is often handy to be able to estimate the amount of memory and disk space 
> that both Lucene and Solr use, given certain assumptions.  I intend to check 
> in an Excel spreadsheet that allows people to estimate memory and disk usage 
> for trunk.  I propose to put it under dev-tools, as I don't think it should 
> be official documentation just yet and like the IDE stuff, we'll see how well 
> it gets maintained.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to