When compound files are enabled, Lucene90CompoundFormat groups all the
files of a segment into two files cfs and cfe. Then the segment info is
written to the si file. And optionally a liv file is also generated.

With a goal of reducing further the number of files of a segment, to be
more cost effective with less calls to cloud providers APIs, I tried to
group cfs cfe and si into a single file.
We can do that by providing replacements for the CompoundFormat and
SegmentInfoFormat in the Codec. Where SegmentInfoFormat will not write the
si file, but let the CompoundFormat write it inside the single-compound
file.

My early measures show an average 38.5% saving on the number of (compound)
files generated per segment.

Is it something of interest upstream?

Bruno

Reply via email to