[
https://issues.apache.org/jira/browse/SOLR-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16857854#comment-16857854
]
Andrzej Bialecki commented on SOLR-13512:
------------------------------------------
Another update:
* added "typesBySize" to provide guidance on what type of data consumes what
part of the total size.
* added sampling of data in case of large indexes. This makes a HUGE
difference in the speed of calculation, and still the results are good enough
to provide useful guidance.
* bug fixes, additional unit testing, some internal API refactoring / renaming.
I'd appreciate a review but if there are no objections I'd like to commit this
shortly.
> Raw index data analysis tool
> ----------------------------
>
> Key: SOLR-13512
> URL: https://issues.apache.org/jira/browse/SOLR-13512
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Attachments: SOLR-13512.patch, SOLR-13512.patch, SOLR-13512.patch,
> rawSizeDetails.json, rawSizeSummary.json
>
>
> A common question from Solr users is how to determine how a given schema
> field and all its related index data contributes to the total index size.
> It's possible to estimate this information by doing a single full pass
> through all index data, aggregating estimated sizes of terms, postings, doc
> values and stored fields. The totals represent of course the worst case
> scenario when there's no index compression at all, but still they should be
> useful for answering the questions above.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]