I almost never use schemaless mode (better named "schema guessing mode") and I would never recommend it for use beyond prototyping. The primary use I see for it is to throw a bunch of data at it to get a starting point for a schema... say for example you want to see what tika's going to produce for metadata before solidifying what you will and will not rely on. I think the ability to suggest a schema is valuable and shouldn't go away. I'm all for not having it be the default configuration however, and I really like the suggestions linked in the ticket for features that consider a number of documents before trying to guess the schema and if we implement one of those I'd be for deprecation and eventual removal, but not before.
The ticket contains a suggestion of adding a catch all '*' dynamic field, but we should make sure to indicate that that ALSO is not typically good for production use because one garbage (or malicious) document can explode the number of fields in the index, or cause cases where forgetting to add a properly typed field makes it much further down the development cycle before getting caught. (i.e. not caught until a user tries to sort on it and gets 1, 10, 11, 2,... ), and dev churn due to data silently indexed into typo variants.... etc. Perhaps we should distribute more than one pre-baked config set and label none of them as "default"? I'd suggest maybe - guessing-proto --> our current _default possibly refined, for protoytping - dynamic-proto --> a schema based on dynamic fields with a * default to text-general as an alternative prototyping tool less dependent on data order, but requiring more editing - managed-min --> A base on which to build a production quality managed schema - static-min --> A base on which to build a production quality classic (non-managed) schema Also +1 to renaming the feature away from "Schemaless" to "Schema Guessing" -Gus On Mon, Aug 3, 2020 at 11:33 AM Marcus Eagan <marcusea...@gmail.com> wrote: > Community, > > There are many of us that have had to deal with the pain of managing the > schemaless mode of operation in Solr. I'm curious to get others thoughts > about how well it is working for them and if they would like to continue to > use it. > > I for one don't think Schemaless works as intended and favor deprecating > it and replacing it with some more usable but I am sure others have > thoughts here. > > Is anyone on this list using schemaless mode in production or have you > tried to? > > A preliminary discussion has occurred in this Jira ticket: > https://issues.apache.org/jira/browse/SOLR-14701 > <https://issues.apache.org/jira/browse/SOLR-14701?> > > Thank you all, > > Marcus Eagan > > -- http://www.needhamsoftware.com (work) http://www.the111shift.com (play)