I'd say ArrayField is a straight up data field at the moment. It stores 0-1
lists of data. It's no different to CommaSeparatedIntegerField (seriously,
why does that exists...)

*If* PG gets the relevant update that will allow `integer[] references`
(i.e. ArrayField(ForeignKey)) then this would be different, and would be
more like a m2m field.

There is an argument that it's 0-N anyway, but in the implementation both
within Django and in the database I don't think the distinction is useful
at the point, from an ORM point of view in any case. For a forms point of
view it's quite different.


On 20 August 2014 09:19, Russell Keith-Magee <[email protected]>
wrote:

>
> On Mon, Aug 18, 2014 at 6:03 PM, Anssi Kääriäinen <[email protected]
> > wrote:
>
>> On Monday, August 18, 2014 7:45:17 AM UTC+3, Russell Keith-Magee wrote:
>>>
>>> I understand what you're driving at here, and I've had similar thoughts
>>> over the course of the SoC. The catch is that this makes the API for
>>> get_fields() fairly complicated.
>>>
>>> If every field fits into one specific type, then get_fields() just
>>> requires a single boolean flag (do I include fields of type X) for each
>>> field type. We can also easily add new field types by adding new booleans
>>> to the API.
>>>
>>> However, if a field fits into multiple categories, then it's impossible
>>> (or, at least, exceedingly complicated) to make a single call to
>>> get_fields() that will specify all your field requirements. "Get me all
>>> non-virtual data fields" requires "virtual=False, data=True, m2m=False",
>>> but "Get all virtual data fields that represent m2ms" requires
>>> "virtual=True, data=False, m2m=True". You can't pass in both sets of
>>> arguments at the same time, so you either have to make multiple calls to
>>> get_fields(), or you have to invent some sort of query syntax for
>>> get_fields() that allows union queries.
>>>
>>> Plus, at the end of the day, get_fields() is abstracted behind highly
>>> cached and optimised properties for key lookups. These properties are
>>> effectively a cached call to get_fields() with a specific set of arguments
>>> - so even if get_fields() doesn't expose a "one category per field"
>>> requirement, the API will require, at some level, names that have clear
>>> (and preferably non-overlapping) membership.
>>>
>>
>> If fields are in multiple categories then users will want to do the full
>> range of set operation on the categories. Encoding that in to the API
>> doesn't sound promising.
>>
>>
>> I don't think users actually want to get fields based on the suggested
>>>> categorization. I feel we get an easier to use and more flexible API if we
>>>> have higher level categories and allow fields to match multiple categories.
>>>> As a practical example if I want all relation fields, that is going to be
>>>> hard using the suggested API. Getting all relation fields is a more
>>>> realistic use case than getting related virtual objects.
>>>>
>>>
>>> Quite probably true. As a point of interest, the current (as in, 1.6)
>>> API actually doesn't differentiate between category (a) "pure data" and
>>> category (b) "relating data (i.e., FK)" fields - if you ask for "data
>>> fields" you get pure data *and* foreign keys. So, at least as far as
>>> Django's own usage is concerned, you're correct in saying that taxonomy
>>> I've described isn't fully required.
>>>
>>> Daniel's survey of internal usage reveals that there are three use cases
>>> for getting a list of fields in Django's internal API:
>>>
>>>  * Get all data and m2m fields (i.e., categories  a, b, and d). This is
>>> effectively "all fields on *this* model"
>>>
>>>  * Get all data, m2m, related objects, related m2m, and virtual fields
>>> (i.e., categories a, b, d, f, g, h, i - excluding c and e because Django
>>> doesn't currently have any fields of this type). This is "all fields on
>>> this model, or related to this model"
>>>
>>>  * Get all m2m fields (i.e., category d)
>>>
>>> So - at the very least, we need names to describe those three groups. My
>>> intention with describing a richer taxonomy is to try and give names to
>>> other groupings of interest.
>>>
>>> If we want to have all fields to match single and only single category,
>>>> then we need to redefine the categories to make sure ForeignKeys as virtual
>>>> fields are possible, and that more esoteric custom join based fields fit in
>>>> to the categorization.
>>>>
>>>
>>> Agreed - that's why I threw this out there for discussion :-)
>>>
>>> Properties like "data", "virtual", "external", "related", "relating" -
>>> these are high level concepts describing the way a field manifests.
>>> However, that doesn't mean we need to expose these properties as part of
>>> the formal API.
>>>
>>> Part of the underlying problem here -- lets say we roll out Django 1.7
>>> with some version of this API, and in 1.8, foreign key fields change to
>>> become virtual. That effectively becomes backwards incompatible for queries
>>> that are sensitive to a "virtual" flag; but it doesn't change the
>>> underlying need to identify that a field is a foreign key. We need to
>>> capture the latter use case, but not necessarily the former.
>>>
>>
>> Could we go with a minimal API for get_fields()? Instead of having
>> categorization on the get_fields() API, we could provide field flags for
>> the categories. With field flags it is straightforward to filter the return
>> list of get_fields(). As an example, fetching those fields which are
>> relations but which aren't virtual: [f for f in get_fields() if
>> f.relational and not f.virtual]. If this path is taken, then I am not sure
>> how minimal the get_fields() API should be. We likely need flags for at
>> least if the field is defined on local, parent or some remote model.
>>
>> As for changing ForeignKey to virtual field plus concrete field
>> representation - I just realized this will be backwards incompatible no
>> matter what we do regarding categorization. An all-fields including
>> get_fields() call will return separate author (virtual) and author_id
>> (concrete) fields after the split. I am not sure what we can do about this.
>> It would be very unfortunate if we can't refactor the way ForeignKeys work
>> due to the meta API. Any ideas how we can avoid the backwards compatibility
>> trap?
>>
>
> I think Daniel and I might have come up with a way to meet both these
> requirements - a minimalist API for get_fields, with at least some
> protection against the known incoming backwards compatibility issue.
>
> The summary so far: it appears that a complex taxonomy isn't especially
> helpful - firstly, because any complex taxonomy is going to have edge cases
> that are hard to categorize, but also because a complex taxonomy leads to a
> much more complex internal API that is going to be prone to backwards
> compatibility problems.
>
> So - instead of worrying about 'virtual' and other properties like that,
> lets look at why the _meta API is fundamentally used - to get a list of
> fields that need to be handled in data processing. This primarily means
> forms, but other forms of serialisation are also included. In these use
> cases, there are always going to be per-field differences (even a CharField
> and an IntegerField require *slightly* different handling), so we won't
> focus on internal representations, storage mechanisms, or anything like
> that. Instead, lets focus on cardinality - a field represents some sort of
> data that has a cardinality with the object on which it is stored. If
> something has cardinality 1, you can display a single field. If it's
> cardinality N, you need to display a list, or some sort of inline.
>
> This results in 3 categories that are mutually exclusive:
>
> a) "Data fields": Fields of cardinality 0-1:
>
>  * A CharField stores 0 or 1 strings (0 is the case of a nullable field).
>
>  * An IntegerField stores 0 or 1 integers.
>
>  * A FileField stores 0 or 1 file paths.
>
>  * An ImageField stores 0 or 1 file paths - although in being modified, it
> might modify some other fields.
>
>  * A ForeignKey stores 0 or 1 references to another object.
>
>  * A GenericForeignKey stores 0 or 1 references to another object.
>
>  * A notional "DocumentField" on a NoSQL store references 0 or 1 external
> documents.
>
> b) "ManyToMany Fields": Fields that are locally defined that represent a
> cardinality 0-N relationship with another object:
>
>  * Many to Many fields store 0-N references to a second model.
>
> c) "Related Objects": Fields that represent a cardinality 0-N relationship
> with this object, but aren't locally defined:
>
>  * The 'related' side of a ForeignKey
>
>  * The 'related' side of a ManyToMany
>
>  * A GenericRelation representing the reverse side of a GenericForeignKey
>
> These three types are mutually exclusive - you either have cardinality 1
> *or* cardinality N, not both; and you're either locally defined on this
> object or you're not. I can't think of an example of "cardinality 1 data
> that isn't defined on this object", but it would fit into this taxonomy if
> it were needed; I also can't think of a field definition that would span
> models.
>
> In addition to this basic classification, a field can be marked as
> "hidden". The immediate use for this is to hide the related_name='+' case
> of a FK or M2M. Looking forward, it would be used to mask fields that
> exist, but aren't intended to be user visible - for example, in the
> potential future case where a ForeignKey is split in two, or a Composite
> Key, there would be a "hidden" integer field (or fields) storing the actual
> data, and a virtual (but non-hidden) field that is the public API for
> manipulating the relationship. This would also be backwards compatible,
> because the "visible" field list hasn't changed.
>
> Fields are also tracked according to their parentage; this is used by
> tools interacting with inheritance relationships to know which fields are
> actually on this model, and which are inherited from a base class.
>
> This yields the following formal API for _meta:
>
>  * get_fields(data, many_to_many, related, include_hidden, include_parents)
>
>  * @property data_fields (=> get_fields(data=True, many_to_many=False,
> related=False, include_hidden=False, include_parents=True)
>
>  * @property many_to_many_fields (=> get_fields(data=False,
> many_to_many=True, related=False, include_hidden=False,
> include_parents=True)
>
>  * @property related_objects (=> get_fields(data=False,
> many_to_many=False, related=True, include_hidden=False,
> include_parents=True)
>
> Does this sound any more sane as an API?
>
> My one lingering question is whether the "many_to_many" name/category is
> too explicit. I can conceive how an ArrayField could be considered a data
> field (it stores 0-1 arrays of data), or a "many_to_many" field (because it
> stores 0-N instances of some data). This all hinges on whether the
> definition for that field category is that it is a relationship with
> another *model*, or if it's just cardinality N data. It's trivial to call
> it a Data field and just leave it at that, but I'm wondering if there might
> be benefit in broadening the definition of "many_to_many".
>
> Russ %-)
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to