[ https://issues.apache.org/jira/browse/SOLR-4658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13618743#comment-13618743 ]
Robert Muir commented on SOLR-4658: ----------------------------------- yeah I think i didnt communicate it well enough. I guess i was thinking: {code} <schema class="FooSchema" someOptionThatMightOnlyMakeSenseToFoo="true"/> {code} So IndexSchema becomes abstract and loaded just like other plugins, versus being a "wonder-do-it-all" class. It would also have the advantage of not having a bunch of options that have illegal combinations (e.g. managed+mutable) Finally it makes it extensible: if someone wants to make their own impl that is powered by microsoft access .MDB files then they can do so. (Also i think it would make backwards compatibility possible in case something changes in drastic ways). > In preparation for dynamic schema modification via REST API, add a "managed" > schema facility > -------------------------------------------------------------------------------------------- > > Key: SOLR-4658 > URL: https://issues.apache.org/jira/browse/SOLR-4658 > Project: Solr > Issue Type: Sub-task > Components: Schema and Analysis > Reporter: Steve Rowe > Assignee: Steve Rowe > Priority: Minor > Fix For: 4.3 > > Attachments: SOLR-4658.patch > > > The idea is to have a set of configuration items in {{solrconfig.xml}}: > {code:xml} > <schema managed="true" mutable="true" > managedSchemaResourceName="managed-schema"/> > {code} > It will be a precondition for future dynamic schema modification APIs that > {{mutable="true"}}. {{solrconfig.xml}} parsing will fail if > {{mutable="true"}} but {{managed="false"}}. > When {{managed="true"}}, and the resource named in > {{managedSchemaResourceName}} doesn't exist, Solr will automatically upgrade > the schema to "managed": the non-managed schema resource (typically > {{schema.xml}}) is parsed and then persisted at {{managedSchemaResourceName}} > under {{$solrHome/$collectionOrCore/conf/}}, or on ZooKeeper at > {{/configs/$configName/}}, and the non-managed schema resource is renamed by > appending {{.bak}}, e.g. {{schema.xml.bak}}. > Once the upgrade has taken place, users can get the full schema from the > {{/schema?wt=schema.xml}} REST API, and can use this as the basis for > modifications which can then be used to manually downgrade back to > non-managed schema: put the {{schema.xml}} in place, then add {{<schema > managed="false"/>}} to {{solrconfig.xml}} (or remove the whole {{<schema/>}} > element, since {{managed="false"}} is the default). > If users take no action, then Solr behaves the same as always: the example > {{solrconfig.xml}} will include {{<schema managed="false" ...>}}. > For a discussion of rationale for this feature, see > [~hossman_luc...@fucit.org]'s post to the solr-user mailing list in the > thread "Dynamic schema design: feedback requested" > [http://markmail.org/message/76zj24dru2gkop7b]: > > {quote} > Ignoring for a moment what format is used to persist schema information, I > think it's important to have a conceptual distinction between "data" that > is managed by applications and manipulated by a REST API, and "config" > that is managed by the user and loaded by solr on init -- or via an > explicit "reload config" REST API. > Past experience with how users percieve(d) solr.xml has heavily reinforced > this opinion: on one hand, it's a place users must specify some config > information -- so people wnat to be able to keep it in version control > with other config files. On the other hand it's a "live" data file that > is rewritten by solr when cores are added. (God help you if you want do a > rolling deploy a new version of solr.xml where you've edited some of the > config values while simultenously clients are creating new SolrCores) > As we move forward towards having REST APIs that treat schema information > as "data" that can be manipulated, I anticipate the same types of > confusion, missunderstanding, and grumblings if we try to use the same > pattern of treating the existing schema.xml (or some new schema.json) as a > hybrid configs & data file. "Edit it by hand if you want, the /schema/* > REST API will too!" ... Even assuming we don't make any of the same > technical mistakes that have caused problems with solr.xml round tripping > in hte past (ie: losing comments, reading new config options that we > forget to write back out, etc...) i'm fairly certain there is still going > to be a lot of things that will loook weird and confusing to people. > (XML may bave been designed to be both "human readable & writable" and > "machine readable & writable", but practically speaking it's hard have a > single XML file be "machine and human readable & writable") > I think it would make a lot of sense -- not just in terms of > implementation but also for end user clarity -- to have some simple, > straightforward to understand caveats about maintaining schema > information... > 1) If you want to keep schema information in an authoritative config file > that you can manually edit, then the /schema REST API will be read only. > 2) If you wish to use the /schema REST API for read and write operations, > then schema information will be persisted under the covers in a data store > whose format is an implementation detail just like the index file format. > 3) If you are using a schema config file and you wish to switch to using > the /schema REST API for managing schema information, there is a > tool/command/API you can run to so. > 4) if you are using the /schema REST API for managing schema information, > and you wish to switch to using a schema config file, there is a > tool/command/API you can run to export the schema info if a config file > format. > ...wether of not the "under the covers in a data store" used by the REST > API is JSON, or some binary data, or an XML file just schema.xml w/o > whitespace/comments should be an implementation detail. Likewise is the > question of wether some new config file formats are added -- it shouldn't > matter. > If it's config it's config and the user owns it. > If it's data it's data and the system owns it. > : is the risk they take if they want to manually edit it - it's no > : different than today when you edit the file and do a Core reload or > : something. I think we can improve some validation stuff around that, but > : it doesn't seem like a show stopper to me. > The new risk is multiple "actors" (both the user, and Solr) editing the > file concurrently, and info that might be lost due to Solr reading the > file, manpulating internal state, and then writing the file back out. > Eg: User hand edits may be lost if they happen on disk during Solr's > internal manpulation of data. API edits may be reflected in the internal > state, but lost if the User writes the file directly and then does a core > reload, etc.... > : At a minimum, I think the user should be able to start with a hand > : modified file. Many people *heavily* modify the example schema to fit > : their use case. If you have to start doing that by making 50 rest API > : calls, that's pretty rough. Once you get your schema nice and happy, you > : might script out those rest calls, but initially, it's much > : faster/easier to whack the schema into place in a text editor IMO. > I don't think there is any disagreement about that. The ability to say > "my schema is a config file and i own it" should always exist (remove > it over my dead body) > The question is what trade offs to expect/require for people who would > rather use an API to manipulate these things -- i don't think it's > unreasable to say "if you would like to manipulate the schema using an > API, then you give up the ability to manipulate it as a config file on > disk" > ("if you want the /schema API to drive your car, you have to take your > foot of hte pedals and let go of the steering wheel") > {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org