[
https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Rowe updated SOLR-4658:
-----------------------------
Attachment: SOLR-4658.patch
Patch implementing the idea.
This makes the IndexSchema constructor private, and adds a factory method named
{{create()}}, which manages the upgrade-to-managed-schema process when
necessary.
The persistence format is kept as XML. A comment at the top says:
{code:xml}
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
{code}
This patch also add a method to {{core.Config}} to test for unexpected element
attributes when parsing {{solrconfig.xml}}:
{{complainAboutUnknownAttributes()}}. I'm only using it for the {{<schema/>}}
tag at this point, but it should be useful for any other config elements that
have a known fixed set of attributes.
Tests added for SolrCloud and standalone modes.
I think it's ready to go.
> In preparation for dynamic schema modification via REST API, add a "managed"
> schema facility
> --------------------------------------------------------------------------------------------
>
> Key: SOLR-4658
> URL: https://issues.apache.org/jira/browse/SOLR-4658
> Project: Solr
> Issue Type: Sub-task
> Components: Schema and Analysis
> Reporter: Steve Rowe
> Assignee: Steve Rowe
> Priority: Minor
> Fix For: 4.3
>
> Attachments: SOLR-4658.patch
>
>
> The idea is to have a set of configuration items in {{solrconfig.xml}}:
> {code:xml}
> <schema managed="true" mutable="true"
> managedSchemaResourceName="managed-schema"/>
> {code}
> It will be a precondition for future dynamic schema modification APIs that
> {{mutable="true"}}. {{solrconfig.xml}} parsing will fail if
> {{mutable="true"}} but {{managed="false"}}.
> When {{managed="true"}}, and the resource named in
> {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade
> the schema to "managed": the non-managed schema resource (typically
> {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}}
> under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at
> {{/configs/$configName/}}, and the non-managed schema resource is renamed by
> appending {{.bak}}, e.g. {{schema.xml.bak}}.
> Once the upgrade has taken place, users can get the full schema from the
> {{/schema?wt=schema.xml}} REST API, and can use this as the basis for
> modifications which can then be used to manually downgrade back to
> non-managed schema: put the {{schema.xml}} in place, then add {{<schema
> managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}}
> element, since {{managed="false"}} is the default).
> If users take no action, then Solr behaves the same as always: the example
> {{solrconfig.xml}} will include {{<schema managed="false" ...>}}.
> For a discussion of rationale for this feature, see
> [[email protected]]'s post to the solr-user mailing list in the
> thread "Dynamic schema design: feedback requested"
> [http://markmail.org/message/76zj24dru2gkop7b]:
>
> {quote}
> Ignoring for a moment what format is used to persist schema information, I
> think it's important to have a conceptual distinction between "data" that
> is managed by applications and manipulated by a REST API, and "config"
> that is managed by the user and loaded by solr on init -- or via an
> explicit "reload config" REST API.
> Past experience with how users percieve(d) solr.xml has heavily reinforced
> this opinion: on one hand, it's a place users must specify some config
> information -- so people wnat to be able to keep it in version control
> with other config files. On the other hand it's a "live" data file that
> is rewritten by solr when cores are added. (God help you if you want do a
> rolling deploy a new version of solr.xml where you've edited some of the
> config values while simultenously clients are creating new SolrCores)
> As we move forward towards having REST APIs that treat schema information
> as "data" that can be manipulated, I anticipate the same types of
> confusion, missunderstanding, and grumblings if we try to use the same
> pattern of treating the existing schema.xml (or some new schema.json) as a
> hybrid configs & data file. "Edit it by hand if you want, the /schema/*
> REST API will too!" ... Even assuming we don't make any of the same
> technical mistakes that have caused problems with solr.xml round tripping
> in hte past (ie: losing comments, reading new config options that we
> forget to write back out, etc...) i'm fairly certain there is still going
> to be a lot of things that will loook weird and confusing to people.
> (XML may bave been designed to be both "human readable & writable" and
> "machine readable & writable", but practically speaking it's hard have a
> single XML file be "machine and human readable & writable")
> I think it would make a lot of sense -- not just in terms of
> implementation but also for end user clarity -- to have some simple,
> straightforward to understand caveats about maintaining schema
> information...
> 1) If you want to keep schema information in an authoritative config file
> that you can manually edit, then the /schema REST API will be read only.
> 2) If you wish to use the /schema REST API for read and write operations,
> then schema information will be persisted under the covers in a data store
> whose format is an implementation detail just like the index file format.
> 3) If you are using a schema config file and you wish to switch to using
> the /schema REST API for managing schema information, there is a
> tool/command/API you can run to so.
> 4) if you are using the /schema REST API for managing schema information,
> and you wish to switch to using a schema config file, there is a
> tool/command/API you can run to export the schema info if a config file
> format.
> ...wether of not the "under the covers in a data store" used by the REST
> API is JSON, or some binary data, or an XML file just schema.xml w/o
> whitespace/comments should be an implementation detail. Likewise is the
> question of wether some new config file formats are added -- it shouldn't
> matter.
> If it's config it's config and the user owns it.
> If it's data it's data and the system owns it.
> : is the risk they take if they want to manually edit it - it's no
> : different than today when you edit the file and do a Core reload or
> : something. I think we can improve some validation stuff around that, but
> : it doesn't seem like a show stopper to me.
> The new risk is multiple "actors" (both the user, and Solr) editing the
> file concurrently, and info that might be lost due to Solr reading the
> file, manpulating internal state, and then writing the file back out.
> Eg: User hand edits may be lost if they happen on disk during Solr's
> internal manpulation of data. API edits may be reflected in the internal
> state, but lost if the User writes the file directly and then does a core
> reload, etc....
> : At a minimum, I think the user should be able to start with a hand
> : modified file. Many people *heavily* modify the example schema to fit
> : their use case. If you have to start doing that by making 50 rest API
> : calls, that's pretty rough. Once you get your schema nice and happy, you
> : might script out those rest calls, but initially, it's much
> : faster/easier to whack the schema into place in a text editor IMO.
> I don't think there is any disagreement about that. The ability to say
> "my schema is a config file and i own it" should always exist (remove
> it over my dead body)
> The question is what trade offs to expect/require for people who would
> rather use an API to manipulate these things -- i don't think it's
> unreasable to say "if you would like to manipulate the schema using an
> API, then you give up the ability to manipulate it as a config file on
> disk"
> ("if you want the /schema API to drive your car, you have to take your
> foot of hte pedals and let go of the steering wheel")
> {quote}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]