[jira] [Updated] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility

Steve Rowe (JIRA) Sun, 31 Mar 2013 20:49:20 -0700

     [ 
https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Steve Rowe updated SOLR-4658:
-----------------------------

    Attachment: SOLR-4658.patch

Patch implementing the idea.

This makes the IndexSchema constructor private, and adds a factory method named 
{{create()}}, which manages the upgrade-to-managed-schema process when 
necessary.

The persistence format is kept as XML.  A comment at the top says:

{code:xml}
<!-- Solr managed schema - automatically generated - DO NOT EDIT -->
{code}

This patch also add a method to {{core.Config}} to test for unexpected element 
attributes when parsing {{solrconfig.xml}}: 
{{complainAboutUnknownAttributes()}}.  I'm only using it for the {{<schema/>}} 
tag at this point, but it should be useful for any other config elements that 
have a known fixed set of attributes.

Tests added for SolrCloud and standalone modes.

I think it's ready to go.

                
> In preparation for dynamic schema modification via REST API, add a "managed" 
> schema facility
> --------------------------------------------------------------------------------------------
>
>                 Key: SOLR-4658
>                 URL: https://issues.apache.org/jira/browse/SOLR-4658
>             Project: Solr
>          Issue Type: Sub-task
>          Components: Schema and Analysis
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Minor
>             Fix For: 4.3
>
>         Attachments: SOLR-4658.patch
>
>
> The idea is to have a set of configuration items in {{solrconfig.xml}}:
> {code:xml}
> <schema managed="true" mutable="true" 
> managedSchemaResourceName="managed-schema"/>
> {code} 
> It will be a precondition for future dynamic schema modification APIs that 
> {{mutable="true"}}.  {{solrconfig.xml}} parsing will fail if 
> {{mutable="true"}} but {{managed="false"}}.
> When {{managed="true"}}, and the resource named in 
> {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade 
> the schema to "managed": the non-managed schema resource (typically 
> {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} 
> under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at 
> {{/configs/$configName/}}, and the non-managed schema resource is renamed by 
> appending {{.bak}}, e.g. {{schema.xml.bak}}.
> Once the upgrade has taken place, users can get the full schema from the 
> {{/schema?wt=schema.xml}} REST API, and can use this as the basis for 
> modifications which can then be used to manually downgrade back to 
> non-managed schema: put the {{schema.xml}} in place, then add {{<schema 
> managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}} 
> element, since {{managed="false"}} is the default).
> If users take no action, then Solr behaves the same as always: the example 
> {{solrconfig.xml}} will include {{<schema managed="false" ...>}}.
> For a discussion of rationale for this feature, see 
> [[email protected]]'s post to the solr-user mailing list in the 
> thread "Dynamic schema design: feedback requested" 
> [http://markmail.org/message/76zj24dru2gkop7b]:
>  
> {quote}
> Ignoring for a moment what format is used to persist schema information, I 
> think it's important to have a conceptual distinction between "data" that 
> is managed by applications and manipulated by a REST API, and "config" 
> that is managed by the user and loaded by solr on init -- or via an 
> explicit "reload config" REST API.
> Past experience with how users percieve(d) solr.xml has heavily reinforced 
> this opinion: on one hand, it's a place users must specify some config 
> information -- so people wnat to be able to keep it in version control 
> with other config files.  On the other hand it's a "live" data file that 
> is rewritten by solr when cores are added.  (God help you if you want do a 
> rolling deploy a new version of solr.xml where you've edited some of the 
> config values while simultenously clients are creating new SolrCores)
> As we move forward towards having REST APIs that treat schema information 
> as "data" that can be manipulated, I anticipate the same types of 
> confusion, missunderstanding, and grumblings if we try to use the same 
> pattern of treating the existing schema.xml (or some new schema.json) as a 
> hybrid configs & data file.  "Edit it by hand if you want, the /schema/* 
> REST API will too!"  ... Even assuming we don't make any of the same 
> technical mistakes that have caused problems with solr.xml round tripping 
> in hte past (ie: losing comments, reading new config options that we 
> forget to write back out, etc...) i'm fairly certain there is still going 
> to be a lot of things that will loook weird and confusing to people.
> (XML may bave been designed to be both "human readable & writable" and 
> "machine readable & writable", but practically speaking it's hard have a 
> single XML file be "machine and human readable & writable")
> I think it would make a lot of sense -- not just in terms of 
> implementation but also for end user clarity -- to have some simple, 
> straightforward to understand caveats about maintaining schema 
> information...
> 1) If you want to keep schema information in an authoritative config file 
> that you can manually edit, then the /schema REST API will be read only. 
> 2) If you wish to use the /schema REST API for read and write operations, 
> then schema information will be persisted under the covers in a data store 
> whose format is an implementation detail just like the index file format.
> 3) If you are using a schema config file and you wish to switch to using 
> the /schema REST API for managing schema information, there is a 
> tool/command/API you can run to so.
> 4) if you are using the /schema REST API for managing schema information, 
> and you wish to switch to using a schema config file, there is a 
> tool/command/API you can run to export the schema info if a config file 
> format.
> ...wether of not the "under the covers in a data store" used by the REST 
> API is JSON, or some binary data, or an XML file just schema.xml w/o 
> whitespace/comments should be an implementation detail.  Likewise is the 
> question of wether some new config file formats are added -- it shouldn't 
> matter.
> If it's config it's config and the user owns it.
> If it's data it's data and the system owns it.
> : is the risk they take if they want to manually edit it - it's no 
> : different than today when you edit the file and do a Core reload or 
> : something. I think we can improve some validation stuff around that, but 
> : it doesn't seem like a show stopper to me.
> The new risk is multiple "actors" (both the user, and Solr) editing the 
> file concurrently, and info that might be lost due to Solr reading the 
> file, manpulating internal state, and then writing the file back out.  
> Eg: User hand edits may be lost if they happen on disk during Solr's 
> internal manpulation of data.  API edits may be reflected in the internal 
> state, but lost if the User writes the file directly and then does a core 
> reload, etc....
> : At a minimum, I think the user should be able to start with a hand 
> : modified file. Many people *heavily* modify the example schema to fit 
> : their use case. If you have to start doing that by making 50 rest API 
> : calls, that's pretty rough. Once you get your schema nice and happy, you 
> : might script out those rest calls, but initially, it's much 
> : faster/easier to whack the schema into place in a text editor IMO.
> I don't think there is any disagreement about that.  The ability to say 
> "my schema is a config file and i own it" should always exist (remove 
> it over my dead body) 
> The question is what trade offs to expect/require for people who would 
> rather use an API to manipulate these things -- i don't think it's 
> unreasable to say "if you would like to manipulate the schema using an 
> API, then you give up the ability to manipulate it as a config file on 
> disk"
> ("if you want the /schema API to drive your car, you have to take your 
> foot of hte pedals and let go of the steering wheel")
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SOLR-4658) In preparation for dynamic schema modification via REST API, add a "managed" schema facility

Reply via email to