[
https://issues.apache.org/jira/browse/SOLR-13512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16858919#comment-16858919
]
Erick Erickson commented on SOLR-13512:
---------------------------------------
What am I actually seeing here? This is for the content of a Wikipedia page
(i.e. textfield)
{code}
"field 'text' [BlockTreeTerms(seg=_p5
terms=3060769,postings=58308889,positions=157811023,docs=900727)]":{
"total":248,
"term index
[FST(input=BYTE1,output=ByteSequenceOutputs]":88},
{code}
I have
3,060,769 terms
58,308,889 postings
157,811,023 positions
900,727 docs.
What is the "total" of 248? I find it hard to believe that this field only
occupies 248 bytes, unless that's just a pointer to, stuff out in MMap space.
So if I'm trying to estimate how much of my RAM this segment needs, what
clues do I have? And is there any way to determine Java heap .vs. MMap space? I
know it's "tricky", what I'm after here is something a user who hasn't a clue
about postings can get their arms around.
Running more experiments....
> Raw index data analysis tool
> ----------------------------
>
> Key: SOLR-13512
> URL: https://issues.apache.org/jira/browse/SOLR-13512
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Andrzej Bialecki
> Assignee: Andrzej Bialecki
> Priority: Major
> Fix For: master (9.0), 8.2
>
> Attachments: SOLR-13512.patch, SOLR-13512.patch, SOLR-13512.patch,
> SOLR-13512.patch, rawSizeDetails.json, rawSizeSummary.json
>
>
> A common question from Solr users is how to determine how a given schema
> field and all its related index data contributes to the total index size.
> It's possible to estimate this information by doing a single full pass
> through all index data, aggregating estimated sizes of terms, postings, doc
> values and stored fields. The totals represent of course the worst case
> scenario when there's no index compression at all, but still they should be
> useful for answering the questions above.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]