[ https://issues.apache.org/jira/browse/SOLR-14701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17192924#comment-17192924 ]
Alexandre Rafalovitch commented on SOLR-14701: ---------------------------------------------- Thanks Jan, these are good questions. To address it from the middle-out, I guess the main question is whether this should end up in the default schema or not. If it ends up in default schema as non-default URP chain, then the usage would be: * bin/post -params "update-chain=guess-schema" ....documents (to update schema, may require to have commit command to support that) * bin/post ..documents (to actually index) In that case, it makes sense to create schema at the end and to have copyField commands and so on. On the other hand, I was envisaging (in general) to have a learning schema and a bunch of learning examples that layer on top of that. Schemaless mode could be one of the examples. Then, you would create a separate core and it could be a default chain there. And then it would echo recommendations at the end back to the user. Thinking about it, this would be a bit of a demotion for schemaless mode. Perhaps too much of a demotion. And perhaps too much core handling. So, maybe it should be a learning/guessing URP chain, not a learning schema after all. Or maybe it can be combined somehow (still in main config) with some advice given in URP's finish() and schema created in commit(). So, a user could run the guess-schema several times, accumulating the changes (with commit off). And if they are happy with them, then they run commit. Or rollback. And hope for now autoCommit configured in schema... And actually, I am not sure how the output/advice actually gets back to the user. So, this may also be in the "too hard" category, but I capture it as a thought point. As to text vs. string, I think if we see text entries longer than 256 characters, we kind of know they make no sense as indexed strings. If we see much longer strings, that could trigger a warning about marking fields as long. But that's not something that could be created automatically. *int->float->string* requires configuration to recognize that and to decide on whether we support just that one level of extra mapping or have to do a full tree-walking implementation. Or hardcode the knowledge that Int/Long can widen to double. > Deprecate Schemaless Mode (Discussion) > -------------------------------------- > > Key: SOLR-14701 > URL: https://issues.apache.org/jira/browse/SOLR-14701 > Project: Solr > Issue Type: Improvement > Components: Schema and Analysis > Reporter: Marcus Eagan > Priority: Major > Attachments: image-2020-08-04-01-35-03-075.png > > > I know this won't be the most popular ticket out there, but I am growing more > and more sympathetic to the idea that we should rip many of the freedoms out > that cause users more harm than not. One of the freedoms I saw time and time > again to cause issues was schemaless mode. It doesn't work as named or > documented, so I think it should be deprecated. > If you use it in production reliably and in a way that cannot be accomplished > another way, I am happy to hear from more knowledgeable folks as to why > deprecation is a bad idea. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org