[jira] [Commented] (SOLR-14701) Deprecate Schemaless Mode (Discussion)

Erick Erickson (Jira) Mon, 03 Aug 2020 08:25:02 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170109#comment-17170109
 ]


Erick Erickson commented on SOLR-14701:
---------------------------------------

bq. Sure. But when we guess right, you can 

I can't disagree more that there's any justification in keeping a feature 
because "it works some of the time".

bq. Doesn't matter, as this is NOT a production feature

Again, I couldn't disagree more with this statement. Schemaless is supposed to 
make getting your feet wet easier and it flat doesn't work in lots of 
situations. There's a workable alternative that is robust. Why keep this around?
 
bq. Perhaps they love it, or perhaps they hate it. Probably a good chunk of both

My claim is that we can provide much better functionality with the dynamic 
field idea or at least something similar. What you lose, of course, is some 
specialization, e.g. no attempt to guess numerics or date types. I'm perfectly 
willing to give that up for something that fulfills its intended purpose: make 
it unnecessary to even know about a schema to do some indexing when you first 
start out.

bq. Since this is mostly contained to one URP ...

That rarely gets any attention. Another bit of orphan code.

bq. For some usecases with well formatted typed data it can work really 
well...Elastic, this is exactly what you need to to there as well.

Agreed, and this is something that would be lost. I'm willing to lose it 
though. Those use cases would still "work" keeping in mind that the intent is 
to press a button without knowing anything about the schema and get _something_ 
that you can search.

I think it would be far more useful to have a button to press that examined a 
current index and spat out a schema. The process would be:

1> index the data with the glob dynamic field
2> press the button and have a process go through all the stored fields (the 
glob is stored=true by default) and generate a schema. Have an option to load 
the generated schema into Zookeeper automatically either with the same name as 
the collection uses or a new name. Or save it somewhere. Or generate the 
correct schema API commands as you suggest. Or...

bq. This feature is only an aid very early on in exploring your data, to avoid 
having to hand edit 142 <field>...

And with the dynamic glob mapping they wouldn't have to edit anything to start 
either. Admittedly you have to get there sometime and when you do you have to 
do some more typing.

bq. It's not hidden, is it? We recommend AGAINST this feature in production

And you're OK with that? We recommend against using it in production in the 
first place because it doesn't work there reliably. So we ship something that 
we know isn't good enough for production, put up with all the noise from the 
test cases that nobody is fixing, consume developer resources whenever anything 
we do breaks any schemaless tests for something people shouldn't use anyway.

bq. Or we could just make a page in Admin UI schema tab...

Which nobody has done. Or even signed up to do in the years since this feature 
was introduced. This is  a variant of the "learning mode" idea. A copy/paste 
into some admin UI window doesn't process nearly enough documents to be robust. 
Nobody is going to paste 10,000 docs in some window.

What's your objection to the glob field .vs. full-blown schemaless? The only 
think I see that we lose is some specializations, and the "examine and generate 
a schema" idea addresses that without shipping something that we then recommend 
nobody actually use for production.


> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
>                 Key: SOLR-14701
>                 URL: https://issues.apache.org/jira/browse/SOLR-14701
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: Schema and Analysis
>            Reporter: Marcus Eagan
>            Priority: Major
>
> I know this won't be the most popular ticket out there, but I am growing more 
> and more sympathetic to the idea that we should rip many of the freedoms out 
> that cause users more harm than not. One of the freedoms I saw time and time 
> again to cause issues was schemaless mode. It doesn't work as named or 
> documented, so I think it should be deprecated. 
> If you use it in production reliably and in a way that cannot be accomplished 
> another way, I am happy to hear from more knowledgeable folks as to why 
> deprecation is a bad idea. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-14701) Deprecate Schemaless Mode (Discussion)

Reply via email to