Re: Feature: Solr implicitly defined field types?

Alexandre Rafalovitch Fri, 04 Jan 2019 08:53:17 -0800

What about if a system schema was loaded at a startup implicitly.
Then, if a new schema is loaded and type definition is missing, it is
copied - at that time - into the specific schema. So, on the first
rewrite those - and only those used - types will be written out.


This allows to version the system types the same way as we version
normal schema. I agree with Gus that hidden configuration causes all
sorts of challenges.

And - for tooling purposes - there definitely needs to be a way to get
all used definitions, explicit and implicit, used and just available.
That also points towards something that already has self-describing
mechanism (like Schema API) available.

Regards,
   Alex.


On Fri, 4 Jan 2019 at 10:45, David Smiley <david.w.smi...@gmail.com> wrote:
>
> I'm thinking this feature would be used conservatively -- and thus just 
> primitive types that wouldn't have an interesting configuration to them, or 
> for something you are really not expected to change (the nest path of nested 
> docs).  So you wouldn't feel you had to go read the docs.  The schema might 
> even have a comment to mention a list of implicit field types (a one-liner 
> comma delimited list).
>
> On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <gus.h...@gmail.com> wrote:
>>
>> I'm perhaps slightly conservative with respect to configuration, but I'm not 
>> fond of hidden configuration that I can't see. What I don't like is looking 
>> at a config file and not seeing the full story. That means i have to read 
>> the config and ALSO go read some part of the documentation that I've failed 
>> to memorize, and probably need to google to find to be fully aware of what's 
>> going on....  (and no I didn't like it when some standard stuff disappeared 
>> from solrconfig.xml a while back either). Small changes of course seem 
>> reasonable, but the further we drift into implicit things, especially if we 
>> get a collection of several implicit things described in various disparate 
>> parts of the manual the more cryptic the system becomes. That's my opinion, 
>> YMMV.
>>
>> -Gus
>>
>> On Thu, Jan 3, 2019 at 2:57 PM David Smiley <david.w.smi...@gmail.com> wrote:
>>>
>>> Broadly, you refer to "locale" issues.  Solr's way of dealing with this 
>>> today is with optional & configurable use of URPs.  The schema-less / 
>>> data-driven mode has some of these enabled; you can see it in the 
>>> solrconfig.xml including many date formats.  You can look into that for 
>>> further info if you like.  The primitive field types are not locale 
>>> sensitive.
>>>
>>> Update: It's looking like 8.0 will only employ this implicit field type 
>>> mechanism for _nest_path_ which probably won't be in the default schema.  
>>> Assuming it isn't, then it'll only be documented in the context of this 
>>> particular feature.  It'd be nice to see the scope of fields expanded and 
>>> at that juncture it could/should be more broadly documented.  That can wait 
>>> to people have energy to do it.
>>>
>>> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <jornfra...@gmail.com> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> I now get the idea and yes this makes sense. It would require though some 
>>>> tutorial or best practices, eg overriding a platform data type may make 
>>>> not so much sense - it may confuse new developers in an existing project 
>>>> that know Solr, but then get a platform type that has not the default 
>>>> behavior.
>>>>
>>>> Could you deal with different languages in platform types? Eg for dates it 
>>>> does not seem a problem, because Solr expects only one specific type of 
>>>> date that needs to be somehow converted beforehand (maybe that conversion 
>>>> could be also part of a platform type), but decimals are different in some 
>>>> languages or Boolean values.
>>>>
>>>> Am 30.12.2018 um 07:01 schrieb David Smiley <david.w.smi...@gmail.com>:
>>>>
>>>> Thanks for your thoughtful response Jörn!
>>>> ...
>>>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <jornfra...@gmail.com> wrote:
>>>>>
>>>>> I think it is a good idea, but I see some potential complexity for 
>>>>> “deployment” of collections. For instance, in environments where Solr is 
>>>>> used as a shared platform amongst several stakeholders, every time you 
>>>>> deploy/modify a collection you need to take care that the platform types 
>>>>> exist. If it exists in the Test environment then i need to make sure that 
>>>>> it exists as well in acceptance/production. The problem is that the 
>>>>> platform type could have been defined by somebody else who has not yet 
>>>>> (eg due to project/sprint delays) not updated the other environments. 
>>>>> Another issue is if I move to another Solr cluster in the same 
>>>>> environment. Then, I have to make sure that all platform types move with 
>>>>> me.
>>>>
>>>>
>>>> RE "the platform type could have been defined by somebody else":  I'm not 
>>>> imagining it'd be configurable, thus the "somebody else" is the Solr 
>>>> project/committers.
>>>>
>>>> Otherwise, I think I get your point, but perhaps I don't.  It's the same 
>>>> point for any use of some new feature of Solr.  If you use some new 
>>>> feature, you have to take care that all Solr instances you deploy your 
>>>> configuration to can handle that new feature.  That's a fairly generic 
>>>> point that would apply to just about anything in Solr.
>>>>
>>>>>
>>>>> A (minor) issue is that platform types may change (for whatever reasons) 
>>>>> and that then potentially all collections have to be reindexed or we have 
>>>>> different versions of the same platform type making things not easier.
>>>>
>>>>
>>>> Yes it's possible.  Though I think that point is apart from the feature I 
>>>> propose.  You're saying that you might want to use an "int" field and then 
>>>> one day realize you want some newer/better definition of what an "int" is 
>>>> (e.g. trie -> points).  Sure.  That's true wether the field type is 
>>>> explicit or implicit.  There's nothing stopping you from explicitly 
>>>> defining the field type if you want to; the names would not be reserved. 
>>>> If you want to stick with your current index running the new Solr version, 
>>>> then you would keep luceneMatchVersion what it was, which would 
>>>> effectively retain the interpretation of the implicit field types.
>>>>
>>>>>
>>>>> Currently we have all our Schema definitions in a version management 
>>>>> system (we use the Schema API but the JSON requests are out there) so 
>>>>> that projects can inspire from each other. Needless to say, that careful 
>>>>> type engineering requires also some documentation on technical design and 
>>>>> may be indeed very Collection specific.
>>>>>
>>>>> Another issue could be that a platform type may also imply a certain 
>>>>> platform solrconfig.xml (eg lib directive etc).
>>>>
>>>>
>>>> I'm imagining platform types would be basic primitive types (int, boolean, 
>>>> etc. and some special situations like in the issue I referenced).  They 
>>>> would not depend on contrib libs... though I could imagine one day an 
>>>> evolution of this in which a contrib could somehow auto-add implicit field 
>>>> types.
>>>>
>>>>>
>>>>> I am not sure yet what are the exact benefits of referring to types of 
>>>>> other collections in the Solr runtime itself instead of having a version 
>>>>> system and letting projects decide if they want to adapt types of other 
>>>>> collections, but maybe I am overlooking something here.
>>>>
>>>>
>>>> The notion of implicit field types is not a cross-config 
>>>> (cross-collection) thing.  Implicit field types are nothing more than 
>>>> built-in shortcuts.
>>>>
>>>> I recall one of my very early observations of Solr's schema was of 
>>>> surprise to see primitive types defined in the schema.  Consider in SQL 
>>>> DDL statements that refer to varchar and such.  Your DDL doesn't need to 
>>>> define what a varchar is!
>>>>
>>>> Happy New Year,
>>>> ~ David
>>>>
>>>>> Am 28.12.2018 um 17:36 schrieb David Smiley <david.w.smi...@gmail.com>:
>>>>>
>>>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it 
>>>>> occurred to me that it would be nice if Solr had implicitly defined field 
>>>>> types.  This would allow you to define a field in your schema that refers 
>>>>> to a type that is not also in your schema -- at least not explicitly 
>>>>> (need not explicitly be put in your schema.xml if classic, or need not be 
>>>>> passed to schema manipulation API if you use that).  The idea would be 
>>>>> that these types would be Solr platform provided field types that need 
>>>>> not be defined by you.
>>>>>
>>>>> There are multiple ways this loose idea might be conceived / imagined 
>>>>> into a concrete proposal.
>>>>>
>>>>> (A) The main idea I'm kicking around right now is that Solr would _not_ 
>>>>> throw an error at the moment of reading your field definition that it 
>>>>> doesn't see your type... instead it would see it's a platform type (via 
>>>>> some built-in hard-coded registry) and then register that type on the 
>>>>> fly.  So if you were to read the schema then you'd see it.  In this way, 
>>>>> it's kind of a shortcut.  Platform field types that you don't actually 
>>>>> refer to will never end up being put into your schema.
>>>>>
>>>>> (B) A schema could pre-initialize with the platform/implicit types.  This 
>>>>> is the simplest idea but I don't like it because you may not even need 
>>>>> some of these types.  I'm not going to go down this path now but wanted 
>>>>> to mention it.
>>>>>
>>>>> I'm exploring (A) right now... I'm hoping to do this for at least a 
>>>>> "_nest_path_"  field in support of nested documents in 8.0, but 
>>>>> conceivably the idea would be expanded to lots of things in our base 
>>>>> schema right now (int, str, etc.)
>>>>> --
>>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
>>>>> http://www.solrenterprisesearchserver.com
>>>>
>>>> --
>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
>>>> http://www.solrenterprisesearchserver.com
>>>
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
>>> http://www.solrenterprisesearchserver.com
>>
>>
>>
>> --
>> http://www.the111shift.com
>
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: 
> http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Feature: Solr implicitly defined field types?

Reply via email to