[
https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17169984#comment-17169984
]
Erick Erickson commented on SOLR-14701:
---------------------------------------
I don't agree that we can rescue this code for the following reasons:
1> When we guess wrong, you can't index some documents. For instance, the first
time a field is indexed that contains "1", an integer field is created. A doc
that has, say, "1.0" in that field fails because it's not an integer. And don't
even get me started on dates.
2> The mechanism for updating the schema is fragile. You can have many shards
trying to update ZKs configset at the same time, leading to instability even if
it does "do the right thing".
3> It's another instance of complex code that we have to maintain. Actually, I
don't think we are maintaining it. And there are consistent failures in that
code lately that aren't getting attention.
4> We don't really deliver "schemaless". What we deliver is something that
doesn't (and can't) work correctly. There have been proposals to, say, have a
"learning mode" that doesn't really index docs, just assembles a schema based
on N documents that'll index all of them, then use that schema. That would make
the problem better, but still fail in some cases.
5> We could improve it around the edges forever trying to make it not fail so
regularly, and the users _still_ have to go in and tweak the schema.
6> The point of schemaless mode at all is that a user can just start indexing
docs without having to deal with managing a schema. They'll have to get into
the schema anyway eventually for anything except the most trivial corpus. So
the suggestion to index every new field as a text field by using the
dynamicField lets them do that without all the baggage.
7> Version control is another hidden gotcha. The schema is changing willy-nilly
on Zookeeper and users have to take periodic snapshots and store it away
somewhere if they wan to preserve it. So now you have a case where, say, they
need to re-index the corpus. If they do it to a new collection, the resulting
schema may well be different, if it works at all. How could it fail? Well, the
first doc originally indexed has a field with 1.0 and becomes a float that
indexes 1 fine in subsequent docs. Next time 'round the order is reversed for
some reason.
8> Big fat warning or not, it doesn't necessarily even work for non-production
code.
Hmmm, though if we wanted to help them make a real schema, we could write
something that processed an existing index and produced an example schema they
could tweak, or even use as-is although I'd rather not have it be automatic.
So if we focus on "let the user index and search documents OOB without having
defining a schema be a barrier to entry", I claim we can create a much simpler
solution with minimal effort and not carry this albatross going forward. Of
course we're still supporting managed-schema, that's a whole different kettle
of fish.
> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
> Key: SOLR-14701
> URL: https://issues.apache.org/jira/browse/SOLR-14701
> Project: Solr
> Issue Type: Improvement
> Security Level: Public(Default Security Level. Issues are Public)
> Components: Schema and Analysis
> Reporter: Marcus Eagan
> Priority: Major
>
> I know this won't be the most popular ticket out there, but I am growing more
> and more sympathetic to the idea that we should rip many of the freedoms out
> that cause users more harm than not. One of the freedoms I saw time and time
> again to cause issues was schemaless mode. It doesn't work as named or
> documented, so I think it should be deprecated.
> If you use it in production reliably and in a way that cannot be accomplished
> another way, I am happy to hear from more knowledgeable folks as to why
> deprecation is a bad idea.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]