[
https://issues.apache.org/jira/browse/SOLR-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Rowe updated SOLR-9526:
-----------------------------
Attachment: SOLR-9526.patch
Attaching patch brought up to date with master (in particular, collapsing of
{{data_driven_schema_configs}} and {{basic_configs}} into {{_default}}) - note
that your original patch only modified {{solrconfig.xml}} on one of these and
{{managed_schema}} on the other - I assume you had/have local changes that
didn't make it into the patch [~janhoy]? I made a couple of other changes;
details below.
{quote}
See new NOCOMMIT comments. I was using the ManagedIndexSchema method
{code}
public ManagedIndexSchema addCopyFields(String source, Collection<String>
destinations, int maxChars)
{code}
which does not have a {{persist=true/false}} argument, so calling it leaves the
schema not persisted. Then I could not find a way to explicitly persist it
since method
{{boolean persistManagedSchema(boolean createOnly)}}
was not public. In this patch I've made it public and done a hacky instanceof
check in AddSchemaFieldsUpdateProcessorFactory
{code}
if (newSchema instanceof ManagedIndexSchema) {
// NOCOMMIT: Hack to avoid persisting schema once after addFields and then
once after each copyField
((ManagedIndexSchema)newSchema).persistManagedSchema(false);
}
{code}
Steve Rowe, you wrote the {{addCopyFields()}} method a while ago, is there a
cleaner way to make sure schema is persisted after adding a copyField?
{quote}
The design of {{ManagedIndexSchema}}'s API was in support of the Schema REST
API, where each resource was modifiable one at a time; "bulk" modifications
weren't possible. In the new bulk schema API, though, the ordinary case
involves multiple modifications; in this case, it is counter-productive to
persist in the middle of a set of operations.
SOLR-6476 (introducing schema "bulk" mode) added the option to *not* persist
the schema after an operation; previously every operation was automatically
persisted. This was added as an option because at the time, bulk and REST
modes co-existed. SOLR-7682 added the ability to specify maxChars for
copyField directives, and I intentionally left off the {{persist}} option of
the new {{addCopyFields()}} method, because there was (intentionally) no way to
invoke this capability via the (now deprecated) schema REST API, and the bulk
schema API didn't need the {{persist}} option.
Long story short: I think making {{persistManagedSchema()}} public is a natural
consequence of the bulk schema API (and in support of bulk operations from
other sources, e.g. this issue). It's just that nobody had gotten around to it
yet.
In the {{AddSchemaFieldsUpdateProcessorFactory.processAdd()}} in my patch I
removed the {{instanceof ManagedIndexSchema}} check wrapping the call to
{{persistManagedSchama()}}, as well as the {{NOCOMMIT}}'s, since the check {{if
( ! cmd.getReq().getSchema().isMutable())}} at the beginning of the method
already insures that we're dealing with a {{ManagedIndexSchema}}.
I also removed the following {{typeMapping}} that was added in your patch from
URP chains {{add-fields-no-run-processor}} and {{parse-and-add-fields}} in
{{solrconfig-add-schema-fields-update-processor-chains.xml}} - I'm assuming
this is a vestige from an earlier concept of removing {{<defaultTypeMapping>}},
since these chains have {{<str name="defaultFieldType">text</str>}}?
{{AddSchemaFieldsUpdateProcessorFactoryTest}} passes with my change:
{code:xml}
<lst name="typeMapping">
<str name="valueClass">java.lang.String</str>
<str name="fieldType">text</str>
</lst>
{code}
> data_driven configs defaults to "strings" for unmapped fields, makes most
> fields containing "textual content" unsearchable, breaks tutorial examples
> ----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: SOLR-9526
> URL: https://issues.apache.org/jira/browse/SOLR-9526
> Project: Solr
> Issue Type: Bug
> Security Level: Public(Default Security Level. Issues are Public)
> Components: UpdateRequestProcessors
> Reporter: Hoss Man
> Assignee: Jan Høydahl
> Labels: dynamic-schema
> Fix For: 7.0
>
> Attachments: SOLR-9526.patch, SOLR-9526.patch, SOLR-9526.patch,
> SOLR-9526.patch, SOLR-9526.patch
>
>
> James Pritchett pointed out on the solr-user list that this sample query from
> the quick start tutorial matched no docs (even though the tutorial text says
> "The above request returns only one document")...
> http://localhost:8983/solr/gettingstarted/select?wt=json&indent=true&q=name:foundation
> The root problem seems to be that the add-unknown-fields-to-the-schema chain
> in data_driven_schema_configs is configured with...
> {code}
> <str name="defaultFieldType">strings</str>
> {code}
> ...and the "strings" type uses StrField and is not tokenized.
> ----
> Original thread:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201609.mbox/%3ccac-n2zrpsspfnk43agecspchc5b-0ff25xlfnzogyuvyg2d...@mail.gmail.com%3E
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]