[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

GitBox Wed, 23 Sep 2020 00:12:12 -0700


arafalov commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-697179182



   Strong words there "worse than useless", especially considering that this - 
to me - seems a strong improvement on the current schemaless mode as it looks 
at more values and actually supports single/multivalued fields. 
   
   In general, I was trying to implement Hoss's proposal, but I am open to the 
other ideas, if we can clarify the use case.
   
   My understanding is that the use case is of having a lot of data that one 
does not quite know the shape off. So, they want to index it quickly, explore 
and then do some manual adjustments.  I am not expecting this to be anywhere 
near production. Schemaless mode should not have been either.
   
   I am not sure how many people will know how to do step 6, but currently they 
don't even have that option. Switching from single-value to multi-value is 
impossible (very hard?) once the actual values are in the index. One has to 
basically delete everything and start again. As happens in the films example, 
if one misses the README. With this one, they can look at field definitions in 
Admin UI and remove or add fields as required without underlying lucene indexes 
throwing complains.
   
   The way I am seeing this (as well as for other example) is to have a super 
minimal learning configuration where every additional field is quite obvious. 
That learning schema, clearly, would not need the step 2 as it would be all 
setup. I thought your question was about how you would test the code for 
yourself.
   
   Additionally, to help see what was changed, I think the tag JIRA could be 
helpful. And frankly, in my imagination, it is not a cloud setup, but a simple 
learning one. Whether that, by itself, is a breaking point for you, we shall 
have to see.
   
   Generating Schema JSON raises its own questions, such as the shape of the 
schema it will be applied to, as guessing is currently happening as a 
differential to the existing schema. Also, this does not seem like the code 
that should be in this particular URP, but more of a general utility. If one 
existed, maybe it would make sense to leverage on top of it.
   
   In general, I am open to implement it any way that seems most useful. I will 
wait for another couple of opinions rather than chasing one very strong one.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene-solr] arafalov commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

Reply via email to