Learning mode won’t work if you have 10 existing collections and want to create #11. We could rather have a SchemaLearningUpdateHandler so people could explicitly post documents to say /schema-guess to modify the schema. We could even have this implicit. Then the _default config would have just _root_, is and a few more, and if you want guessing you first send a number of docs to /schema-guess endpoint and then inspect in schema browser what you got. That handler could support a Parma &reset=true which would wipe the schema to start guessing from scratch.
Jan Høydahl > 4. aug. 2020 kl. 15:30 skrev Gus Heck <[email protected]>: > > > Interesting read. Might have changed now that we have authentication > capabilities... but let's not thread jack :) > >> On Tue, Aug 4, 2020 at 8:28 AM Erick Erickson <[email protected]> >> wrote: >> Having the admin UI allow uploads may not be secure. When I had a similar >> idea a long time ago it got shot down, see the discussion at: >> https://issues.apache.org/jira/browse/SOLR-5287. >> >> I _think_ this is a different issue if the configs have to be residing on >> the system, not coming in from outside, just FYI... >> >> > On Aug 3, 2020, at 7:03 PM, Gus Heck <[email protected]> wrote: >> > >> > >> > >> > On Mon, Aug 3, 2020 at 5:03 PM Erick Erickson <[email protected]> >> > wrote: >> > Gus’s point about implementing something before removing it is well taken, >> > but we can deprecate it immediately without removing it. Gus’s point about >> > dynamic fields not being found until later in the cycle is well taken, but >> > not enough to persuade me. >> > >> > Fair enough :) >> > >> > I’m not enthusiastic about multiple getting started schemas. The whole >> > motivation behind schemaless is that the user doesn’t need to know about >> > schemas to get started. By providing multiple “getting started” schemas we >> > require them to become aware of schemas again. >> > >> > Here's my theory (which may or may not be persuasive :) ) >> > >> > My thinking in that suggestion is that the majority of the problem is due >> > to the fact that people new to a technology will tend to latch onto the >> > defaults that come with something as being something that should be held >> > onto until you have a good reason to change it. This is reasonable because >> > changing things you don't understand willy nilly is often a road to pain. >> > And people DO want a safe starting point and we should give it to them >> > because it makes their life easier once they get a little further down the >> > road, but this is not compatible with the easy-start schemaless mode. >> > Looking at https://lucene.apache.org/solr/guide/8_5/solr-tutorial.html I >> > see that the initial tutorial experience is fully scripted, and the user >> > won't likely notice if they are told to ignore _default or guessing-proto >> > in favor of the tech products config set... BUT when they do get to the >> > point of looking at the config name they'll see the more descriptive name. >> > So rather than seeing "_default" and thinking "Ah ha! Here's something I >> > can take as gospel and not change until I have a reason!" they'll see >> > "guessing-proto" or "dynamic-proto" and say "Hunh, I wonder what that >> > means?" which is a good question for them to ask I think. >> > >> > The concept of a default lays in a strong bias of not touching it (IMHO) >> > which will be wrong most of the time no matter what we give them as a >> > default. If something must be a default I'd favor a non-managed, >> > non-dynamic, non-guessing minimal schema with the required fields, and an >> > id field, maybe a _text_ field, and a comment pointing to the section of >> > the ref guide where they can copy and paste in all the stuff that's >> > currently in our base schema as example (things like the text_ga type), IF >> > they want it. I get really tired of seeing mile long schemas that have a >> > ton of unused stuff that is retained because people didn't know if they >> > needed it or not... >> > >> > Note that not having some default would break back compat, on bin/solr but >> > changing the default is also a break of sorts. >> > >> > >> > All that said, maybe we could rethink the approach. My two objections are: >> > 1> schemaless, by updating the schema based on a very small sample set is >> > very susceptible to failing early and often >> > 2> Constantly updating the config in ZK and reloading the collections >> > seems very hard to get right. >> > >> > I have for some time thought the inability to upload and download a config >> > (or files within a config) via the web UI was a gap. But I found it easier >> > to write https://plugins.gradle.org/plugin/com.needhamsoftware.solr-gradle >> > than add that feature to the UI :) >> > >> > So I can imagine a “getting started” mode that indexed to the glob field >> > while creating a schema. Ideally, it would be necessary to enable it >> > specifically rather than have it be the default. I’d imagine this being >> > coupled with some kind of “export schema” button. So the process would be >> > > start Solr with -Dsolr.learningmode.confg=some_config_name. >> > > index a bunch of documents, perhaps prototyping the search app on the >> > > dynamic glob field. >> > > The admin UI should have a big, intrusive banner saying “RUNNING IN >> > > LEARNING MODE” with instructions on what to do next. >> > > In that mode there’d need to be a “save schema” button or something. >> > > What I’d like that to do would be examine the index and write a new >> > > schema somewhere. If ths was the mode, then you’d be able to run it any >> > > time. >> > >> > +1 for anything that makes a round-trip of working with the schema easier, >> > but not really a fan of learning mode. >> > >> > >> > >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> > > > -- > http://www.needhamsoftware.com (work) > http://www.the111shift.com (play)
