[ 
https://issues.apache.org/jira/browse/SOLR-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857854#comment-16857854
 ] 

Andrzej Bialecki  commented on SOLR-13512:
------------------------------------------

Another update:
 * added "typesBySize" to provide guidance on what type of data consumes what 
part of the total size.
 * added sampling of data in case of large indexes. This makes a HUGE 
difference in the speed of calculation, and still the results are good enough 
to provide useful guidance.
 * bug fixes, additional unit testing, some internal API refactoring / renaming.

I'd appreciate a review but if there are no objections I'd like to commit this 
shortly.

> Raw index data analysis tool
> ----------------------------
>
>                 Key: SOLR-13512
>                 URL: https://issues.apache.org/jira/browse/SOLR-13512
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki 
>            Assignee: Andrzej Bialecki 
>            Priority: Major
>         Attachments: SOLR-13512.patch, SOLR-13512.patch, SOLR-13512.patch, 
> rawSizeDetails.json, rawSizeSummary.json
>
>
> A common question from Solr users is how to determine how a given schema 
> field and all its related index data contributes to the total index size.
> It's possible to estimate this information by doing a single full pass 
> through all index data, aggregating estimated sizes of terms, postings, doc 
> values and stored fields. The totals represent of course the worst case 
> scenario when there's no index compression at all, but still they should be 
> useful for answering the questions above.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to