[
https://issues.apache.org/jira/browse/SOLR-17052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris M. Hostetter updated SOLR-17052:
--------------------------------------
Summary: SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy,
buggy, and inefficient (was: SchemaCodecFactory/IndexSchema/FieldType
relationships are kludgy and should be inverted)
> SchemaCodecFactory/IndexSchema/FieldType relationships are kludgy, buggy, and
> inefficient
> -----------------------------------------------------------------------------------------
>
> Key: SOLR-17052
> URL: https://issues.apache.org/jira/browse/SOLR-17052
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Chris M. Hostetter
> Priority: Major
>
> While getting familiar with the {{SolreCore + CodecFactory +
> SchemaCodecFactory + FieldType}} related code relevant to SOLR-17045,
> SOLR-17046, & SOLR-17047 It occurred to me that there is a lot of
> ineffeciencies and kludginess to how {{FieldType}} based "codec overrides"
> are used (and validated) by {{SchemaCodecFactory}} (and
> {{{}SolrCore.initCodec{}}}) :
> * {{SolrCore.initCodec}} needs to be aware of all the possible ways a
> {{FieldType}} instance might support codec overrides
> ** ... so it can fail if any are specified unless the {{CodecFactory
> instanceOf SolrCoreAware}}
> *** ... even though that still doesn't ensure the factory supports those
> field type overrides
> ** This validation currently just looks at {{getPostingsFormatForField}} &
> {{getDocValuesFormatForField}}
> *** ... it's ignorant about {{DenseVectorField}} 's assumptions about being
> able to override aspects of the {{KnnVectorsFormat}}
> *** ... and AFAICT, what validation is don't doesn't help if the Schema API
> is used to add new field types (w/ {{postingsFormat}} or {{docValuesFormat}}
> overrides)
> * in all of the the {{SchemaCodecFactory}} "per-field" methods
> ({{{}getPostingsFormatForField{}}}, {{{}getDocValuesFormatForField{}}}, &
> {{{}getKnnVectorsFormatForField{}}}) ...
> ** ... every call to these methods resolves a {{SchemaField}} instance –
> even though only the (Solr) {{FieldType}} is needed
> *** Asking the {{IndexSchema}} for the {{SchemaField}} of a fieldName has
> more overhead then just asking for the {{FieldType}}
> *** None of the things these methods care about can be configured on a
> per-fieldName bassis anyway.
> ** For {{PostingsFormat}} and {{{}DocValuesFormat{}}}, every call to these
> methods repeats the SPI lookup on the "format name" configured on the
> {{FieldType}} instance
> ** For {{KnnVectorsFormat}} every call to this method constructs a new
> {{SolrDelegatingKnnVectorsFormat}} – even though the same instance could be
> re-used for every field of the same {{FieldType}} instance.
> * In {{FieldType}} ...
> ** ... there is no validation anywhere that the {{postingsFormat}} or
> {{docValuesFormat}} are valid
> *** ... bogus values only cause a problem when the {{SchemaCodecFactory}}
> tries to resolve them (when indexing)
> * In {{DenseVectorField}} ...
> ** ... {{checkSchemaField}} validates (and logs warnings) based on the
> {{vectorEncoding}} and {{{}dimensions{}}}...
> *** ... Even though these validations aren't "field" specific – they are
> "type" specific, and could be validated in {{DenseVectorField.init()}}
> ** BUT! ... there is no validation anywhere that the {{knnAlgorithm}} is
> supported, or that the HNSW options make sense for it
> *** These are only validated by the
> {{Codec.getKnnVectorsFormatForField(...)}} impl provided by
> {{SchemaCodecFactory}} ...
> **** ... and they are redundenly validated on every call
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]