Dear Developers, I learned that *omitting norms during indexing for a field saves a byte per document *in Lucene. However, during my testing, I observed varying results in the overall size of the Lucene index (collection of documents) when disabling norms for string fields during indexing.
Here are the configuration details for reference: - *Lucene Version:* 5.3.1 - *Java Version:* OpenJDK 17.0.8.1 - *Indexer Configuration:* - index.merge_factor: 10 - index.partition_max_doc: 5,000,000 - indexer.commit_interval_sec: 60 - indexer.commit_max_doc: 100,000 - *Merge Policy:* LogByteSizeMergePolicy *Test Results:* *TEST DATA * *#UNIQUE FIELDS IN AN INDEX(5M DOCUMENTS)* *#STRING FIELDS - FOR WHICH NORMS WILL BE ENABLED OR DISABLED* *AVG SIZE OF INDEX IN MB [NORMS ENABLED] * *AVG SIZE OF INDEX IN MB [NORMS DISABLED]* *DIFFERENCE* DATA - I (All documents contain same set of fields and their values) 103 74 1869 1876 No difference DATA - II (All documents contain same set of fields but having random values) 128 113 25412 31890 Increased by 20% DATA - II (Documents contain different sets of field-value pairs, subsets of all field-value pairs) 184 87 2295 2005 Reduced by 14% DATA - IV(Documents contain different sets of field-value pairs, subsets of all field-value pairs) 1091 1026 10512 5905 Reduced by 43% Could you please provide insights or clarify whether this behavior aligns with the expected impact on index size? Additionally, could you explain why the size reduction appears to be unpredictable? Thank you for your assistance! With Regards, Balaram Sharma