[ 
https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192924#comment-17192924
 ] 

Alexandre Rafalovitch commented on SOLR-14701:
----------------------------------------------

Thanks Jan, these are good questions. To address it from the middle-out, I 
guess the main question is whether this should end up in the default schema or 
not.

If it ends up in default schema as non-default URP chain, then the usage would 
be:
 * bin/post -params "update-chain=guess-schema" ....documents (to update 
schema, may require to have commit command to support that)
 * bin/post ..documents (to actually index)

In that case, it makes sense to create schema at the end and to have copyField 
commands and so on.

 

On the other hand, I was envisaging (in general) to have a learning schema and 
a bunch of learning examples that layer on top of that. Schemaless mode could 
be one of the examples. Then, you would create a separate core and it could be 
a default chain there. And then it would echo recommendations at the end back 
to the user. Thinking about it, this would be a bit of a demotion for 
schemaless mode. Perhaps too much of a demotion. And perhaps too much core 
handling. So, maybe it should be a learning/guessing URP chain, not a learning 
schema after all.

 

Or maybe it can be combined somehow (still in main config) with some advice 
given in URP's finish() and schema created in commit(). So, a user could run 
the guess-schema several times, accumulating the changes (with commit off). And 
if they are happy with them, then they run commit. Or rollback. And hope for 
now autoCommit configured in schema... And actually, I am not sure how the 
output/advice actually gets back to the user. So, this may also be in the "too 
hard" category, but I capture it as a thought point.

 

As to text vs. string, I think if we see text entries longer than 256 
characters, we kind of know they make no sense as indexed strings. If we see 
much longer strings, that could trigger a warning about marking fields as long. 
But that's not something that could be created automatically.

 

*int->float->string* requires configuration to recognize that and to decide on 
whether we support just that one level of extra mapping or have to do a full 
tree-walking implementation. Or hardcode the knowledge that Int/Long can widen 
to double.

 

> Deprecate Schemaless Mode (Discussion)
> --------------------------------------
>
>                 Key: SOLR-14701
>                 URL: https://issues.apache.org/jira/browse/SOLR-14701
>             Project: Solr
>          Issue Type: Improvement
>          Components: Schema and Analysis
>            Reporter: Marcus Eagan
>            Priority: Major
>         Attachments: image-2020-08-04-01-35-03-075.png
>
>
> I know this won't be the most popular ticket out there, but I am growing more 
> and more sympathetic to the idea that we should rip many of the freedoms out 
> that cause users more harm than not. One of the freedoms I saw time and time 
> again to cause issues was schemaless mode. It doesn't work as named or 
> documented, so I think it should be deprecated. 
> If you use it in production reliably and in a way that cannot be accomplished 
> another way, I am happy to hear from more knowledgeable folks as to why 
> deprecation is a bad idea. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to