On 7/26/2017 11:33 PM, sandesh.yapuram wrote: > Hello, I'm using lucene 6.3.0 > I have an index which has 500k documents with each document having 53 > fields. > The problem is the index size is becoming an issue day by day so we are > planning to weed out or trim some fields. I'm trying to get estimate size of > each field using luke but the tool only shows me no. of terms and > frequencies which may not suggest exact size of that field inside the index.
As you were told by Adrien, the exact information you're after isn't available. But -- the information that you CAN get is useful in order to determine *relative* sizes of one field compared to another. It should be enough for you to know that field X is larger than field Y. Various parameters of the field will have strong influence on the index size -- whether it is stored, indexed, has docValues, termvectors, etc. Stored data and termvectors are compressed by default. DocValues are not compressed. Indexed data is typically smaller than the original source data. How much smaller will depend on the type of data and (if it's text) what kind of analysis is being done. Experimentation with how your documents and fields are defined is usually the best way to get concrete numbers. Thanks, Shawn --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
