Re: Deprecate Schemaless Mode?

Jan Høydahl Tue, 04 Aug 2020 11:25:03 -0700

Learning mode won’t work if you have 10 existing collections and want to create 
#11. We could rather have a SchemaLearningUpdateHandler so people could 
explicitly post documents to say  /schema-guess to modify the schema. We could 
even have this implicit. Then the _default config would have just _root_, is 
and a few more, and if you want guessing you first send a number of docs to 
/schema-guess endpoint and then inspect in schema browser what you got. That 
handler could support a Parma &reset=true which would wipe the schema to start 
guessing from scratch.


Jan Høydahl

> 4. aug. 2020 kl. 15:30 skrev Gus Heck <[email protected]>:
> 
> 
> Interesting read. Might have changed now that we have authentication 
> capabilities... but let's not thread jack :)
> 
>> On Tue, Aug 4, 2020 at 8:28 AM Erick Erickson <[email protected]> 
>> wrote:
>> Having the admin UI allow uploads may not be secure. When I had a similar 
>> idea a long time ago it got shot down, see the discussion at: 
>> https://issues.apache.org/jira/browse/SOLR-5287.
>> 
>> I _think_ this is a different issue if the configs have to be residing on 
>> the system, not coming in from outside, just FYI...
>> 
>> > On Aug 3, 2020, at 7:03 PM, Gus Heck <[email protected]> wrote:
>> > 
>> > 
>> > 
>> > On Mon, Aug 3, 2020 at 5:03 PM Erick Erickson <[email protected]> 
>> > wrote:
>> > Gus’s point about implementing something before removing it is well taken, 
>> > but we can deprecate it immediately without removing it. Gus’s point about 
>> > dynamic fields not being found until later in the cycle is well taken, but 
>> > not enough to persuade me. 
>> > 
>> > Fair enough :) 
>> >  
>> > I’m not enthusiastic about multiple getting started schemas. The whole 
>> > motivation behind schemaless is that the user doesn’t need to know about 
>> > schemas to get started. By providing multiple “getting started” schemas we 
>> > require them to become aware of schemas again.
>> > 
>> > Here's my theory (which may or may not be persuasive :) ) 
>> > 
>> > My thinking in that suggestion is that the majority of the problem is due 
>> > to the fact that people new to a technology will tend to latch onto the 
>> > defaults that come with something as being something that should be held 
>> > onto until you have a good reason to change it. This is reasonable because 
>> > changing things you don't understand willy nilly is often a road to pain. 
>> > And people DO want a safe starting point and we should give it to them 
>> > because it makes their life easier once they get a little further down the 
>> > road, but this is not compatible with the easy-start schemaless mode. 
>> > Looking at https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html I 
>> > see that the initial tutorial experience is fully scripted, and the user 
>> > won't likely notice if they are told to ignore _default or guessing-proto 
>> > in favor of the tech products config set... BUT when they do get to the 
>> > point of looking at the config name they'll see the more descriptive name. 
>> > So rather than seeing "_default" and thinking "Ah ha! Here's something I 
>> > can take as gospel and not change until I have a reason!" they'll see 
>> > "guessing-proto" or "dynamic-proto" and say "Hunh, I wonder what that 
>> > means?" which is a good question for them to ask I think. 
>> > 
>> > The concept of a default lays in a strong bias of not touching it (IMHO) 
>> > which will be wrong most of the time no matter what we give them as  a 
>> > default. If something must be a default I'd favor a non-managed, 
>> > non-dynamic, non-guessing minimal schema with the required fields, and an 
>> > id field, maybe a _text_ field, and a comment pointing to the section of 
>> > the ref guide where they can copy and paste in all the stuff that's 
>> > currently in our base schema as example (things like the text_ga type), IF 
>> > they want it. I get really tired of seeing mile long schemas that have a 
>> > ton of unused stuff that is retained because people didn't know if they 
>> > needed it or not... 
>> > 
>> > Note that not having some default would break back compat, on bin/solr but 
>> > changing the default is also a break of sorts. 
>> >  
>> > 
>> > All that said, maybe we could rethink the approach. My two objections are:
>> > 1> schemaless, by updating the schema based on a very small sample set is 
>> > very susceptible to failing early and often
>> > 2> Constantly updating the config in ZK and reloading the collections 
>> > seems very hard to get right.
>> >  
>> > I have for some time thought the inability to upload and download a config 
>> > (or files within a config) via the web UI was a gap. But I found it easier 
>> > to write https://plugins.gradle.org/plugin/com.needhamsoftware.solr-gradle 
>> > than add that feature to the UI :)
>> >  
>> > So I can imagine a “getting started” mode that indexed to the glob field 
>> > while creating a schema. Ideally, it would be necessary to enable it 
>> > specifically rather than have it be the default. I’d imagine this being 
>> > coupled with some kind of “export schema” button. So the process would be
>> > > start Solr with -Dsolr.learningmode.confg=some_config_name.
>> > > index a bunch of documents, perhaps prototyping the search app on the 
>> > > dynamic glob field.
>> > > The admin UI should have a big, intrusive banner saying “RUNNING IN 
>> > > LEARNING MODE” with instructions on what to do next.
>> > > In that mode there’d need to be a “save schema” button or something. 
>> > > What I’d like that to do would be examine the index and write a new 
>> > > schema somewhere. If ths was the mode, then you’d be able to run it any 
>> > > time.
>> > 
>> > +1 for anything that makes a round-trip of working with the schema easier, 
>> > but not really a fan of learning mode.  
>> >  
>> > 
>> > 
>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>> 
> 
> 
> -- 
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)

Re: Deprecate Schemaless Mode?

Reply via email to