Dear Developers,

I learned that *omitting norms during indexing for a field saves a byte per
document *in Lucene. However, during my testing, I observed varying results
in the overall size of the Lucene index (collection of documents) when
disabling norms for string fields during indexing.

Here are the configuration details for reference:

   - *Lucene Version:* 5.3.1
   - *Java Version:* OpenJDK 17.0.8.1
   - *Indexer Configuration:*
      - index.merge_factor: 10
      - index.partition_max_doc: 5,000,000
      - indexer.commit_interval_sec: 60
      - indexer.commit_max_doc: 100,000
   - *Merge Policy:* LogByteSizeMergePolicy


*Test Results:*

*TEST DATA *

*#UNIQUE FIELDS IN AN INDEX(5M DOCUMENTS)*

*#STRING FIELDS - FOR WHICH NORMS WILL BE ENABLED OR DISABLED*

*AVG SIZE OF INDEX IN MB [NORMS ENABLED] *

*AVG SIZE OF INDEX IN MB [NORMS DISABLED]*

*DIFFERENCE*

DATA - I (All documents contain same set of fields and their values)

103

74

1869

1876

No difference

DATA - II (All documents contain same set of fields but having random
values)

128

113

25412

31890

Increased by 20%

DATA - II (Documents contain different sets of field-value pairs, subsets
of all field-value pairs)

184

87

2295

2005

Reduced by 14%



DATA - IV(Documents contain different sets of field-value pairs, subsets of
all field-value pairs)

1091

1026

10512

5905

Reduced by 43%

Could you please provide insights or clarify whether this behavior aligns
with the expected impact on index size? Additionally, could you explain why
the size reduction appears to be unpredictable?

Thank you for your assistance!


With Regards,

Balaram Sharma

Reply via email to