Hi

I reviewed the index configuration section in the ref guide and then
SolrIndexConfig and I'm not sure if that's intentional, or again a relic
from older days when configuration was simpler. But I think that the
parameter useCompoundFiles needs some clarification:

IndexWriterConfig.useCompoundFiles controls whether newly flushed segments
are packed in a compound file or not. This is important to modify if you
e.g. do batch indexing and intend to finish with a big merge, in that case
the extra CFS packing is redundant for every flush.

MergePolicy.noCFSRatio determines which merged segments are packed into a
CFS (and there's also maxCFSSegmentSizeMB). This lets you avoid the extra
packing for very large segments, where the packing itself is expensive
during indexing, but does not buy you much during searching.

In SolrIndexConfig I see that if the user defined a top-level
<useCompoundFile> element (i.e. outside the MP setting), it controls both
IWC and MP (sets noCFSRatio=1.0). The code does the right thing though, in
that if you also specify noCFSRatio and maxCFSSegmentSizeMB, they are
applied correctly later on.

I understand that this might seem as a simplification to users, where they
set this value once and it controls both places, but I think it's bad.
First, because if you set <useCompoundFile>, you basically *always* end up
w/ CFS, even if you intend that to apply to only newly flushed segments. In
order to use default settings for merged segments, you have to explicitly
include the default settings in the <mergePolicy> element. This is trappy I
think and looks odd.

Also, I think that it's fine if our users understand the implications of
setting either values. The defaults are fine as they are, and if users
really want to get into that place, it's OK if we ask them to read the docs
and understand which parameter they set and for what purpose.

Beyond that, SolrIndexConfig is trunk contains deprecated code around this
parameter and somewhat hacks around older schemas that defined useCFS
inside the MP element -- are we still required to support that back-compat
in trunk as well?

These two issues could be handled separately, but if others agree that we
should use explicit settings for this, I don't mind tackling both (explicit
settings and remove deprecated code in trunk) under one issue.

Shai

Reply via email to