Re: GSoC Meta refactor: Bikeshedding time!!

Russell Keith-Magee Wed, 20 Aug 2014 20:08:03 -0700

On Thu, Aug 21, 2014 at 1:29 AM, Anssi Kääriäinen <[email protected]>
wrote:

> On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
>
>> I think Daniel and I might have come up with a way to meet both these
>> requirements - a minimalist API for get_fields, with at least some
>> protection against the known incoming backwards compatibility issue.
>>
>> The summary so far: it appears that a complex taxonomy isn't especially
>> helpful - firstly, because any complex taxonomy is going to have edge cases
>> that are hard to categorize, but also because a complex taxonomy leads to a
>> much more complex internal API that is going to be prone to backwards
>> compatibility problems.
>>
>> So - instead of worrying about 'virtual' and other properties like that,
>> lets look at why the _meta API is fundamentally used - to get a list of
>> fields that need to be handled in data processing. This primarily means
>> forms, but other forms of serialisation are also included. In these use
>> cases, there are always going to be per-field differences (even a CharField
>> and an IntegerField require *slightly* different handling), so we won't
>> focus on internal representations, storage mechanisms, or anything like
>> that. Instead, lets focus on cardinality - a field represents some sort of
>> data that has a cardinality with the object on which it is stored. If
>> something has cardinality 1, you can display a single field. If it's
>> cardinality N, you need to display a list, or some sort of inline.
>>
>> This results in 3 categories that are mutually exclusive:
>>
>> a) "Data fields": Fields of cardinality 0-1:
>> <SNIP>
>>
>> b) "ManyToMany Fields": Fields that are locally defined that represent a
>> cardinality 0-N relationship with another object:
>> <SNIP>
>>
>> c) "Related Objects": Fields that represent a cardinality 0-N
>> relationship with this object, but aren't locally defined:
>> <SNIP>
>>
>
>
>> These three types are mutually exclusive - you either have cardinality 1
>> *or* cardinality N, not both; and you're either locally defined on this
>> object or you're not. I can't think of an example of "cardinality 1 data
>> that isn't defined on this object", but it would fit into this taxonomy if
>> it were needed; I also can't think of a field definition that would span
>> models.
>>
>
> The reverse of OneToOneField is a cardinality 1 data that isn't defined on
> this object.
>

And the obvious answer was looking right at me :-). I had that mentally
wrapped into (c) (because historically O2O is handled as a redundant case
of FK). This suggests to me that either (a) the "related" flag is more
about "objects that have a relationship with this one", rather than being
specifically about cardinality, or (b) there's another group for 0-1
cardinality reverse relationships.

> In addition to this basic classification, a field can be marked as
>> "hidden". The immediate use for this is to hide the related_name='+' case
>> of a FK or M2M. Looking forward, it would be used to mask fields that
>> exist, but aren't intended to be user visible - for example, in the
>> potential future case where a ForeignKey is split in two, or a Composite
>> Key, there would be a "hidden" integer field (or fields) storing the actual
>> data, and a virtual (but non-hidden) field that is the public API for
>> manipulating the relationship. This would also be backwards compatible,
>> because the "visible" field list hasn't changed.
>>
>
> There are use cases that do not fit this categorization. For example when
> instantiating a model from database you will need to supply the hidden
> integer field data for a foreign key, but you must skip the foreign key
> field itself. That is, a model with relation to author is initialized as
> MyModel(pk=1, author_first_name='foo', author_last_name='bar') (technically
> this is done through *args for performance reasons), not with MyModel(pk=1,
> author=author_instance). Similar considerations likely apply to
> serialization of models.
>

> Form fields for a model is another consideration. If one wants those
> fields that should have a field in a form, that is currently defined as [f
> for f in model._meta.fields if f.editable]. The editable fields set doesn't
> necessarily match the above categorization. In fact, I believe if we
> inspect Django's code base it will be clear there can't be any
> categorization where fields belong to only one category, but which fulfills
> all use cases in Django. It is like trying to categorize animals for every
> use case. If you want mammals, then categorization to sea and land
> creatures will not work. If you want sea creatures, then categorization to
> mammals and fish is useless.
>
> Sure - but let me say in advance that the API I've proposed here isn't a
purely theoretical exercise. Daniel has implemented this API to prove that
it's sufficient to meet all Django's existing use cases. The three
categories (plus the two include qualifiers) I've described meets that
criterion. However, it might be missing potential future use cases, and
that's really what we're trying to flesh out here.

The "editable" thing doesn't especially concern me, for exactly the reason
your example demonstrates. The role of the meta API (to me, at least) is to
provide a candidate list of fields that need to be dealt with as part of
introspection. The only reason the flags/categories in Meta matter is the
extent to which they represent the need for fundamentally different classes
of data handling. If you're building a form, that means you're going to
need to check the editable flag on each field. It also means deferring some
behaviour to the field itself (to_python calls, calls to persist files to
storage, and so on). I don't believe this means we need to embed the
concept of "editability" into the Meta API.

The point is that I am convinced we will need to provide field flags to
> complement the get_fields() API no matter what API we choose for
> get_fields(). In fact, if we define and document a sane set of field flags,
> then the get_fields() API isn't that important, it just needs to be useful
> for the most common use cases.
>

Well, no - it needs to be useful for *all* the use cases *in Django's
codebase*. The end goal here is to provide a formal API definition so that
someone else can take the specification, make a duck that quacks exactly
like it, and use it because it is compatible with Django's internals.

As an indicative goal - I'm thinking a good GSoC project for next year
would be to implement a Django-compatible model layer for SQLAlchemy. That
means the student will need to implement get_fields() (or whatever API we
end up with) to sufficient depth that they can expose a SQLAlchemy model in
Django's Admin, using Django's forms. Daniel's proof-of-concept project
wrapping an email API demonstrates that this isn't a theoretical goal -
it's has the potential of being real.

> Fields are also tracked according to their parentage; this is used by
>> tools interacting with inheritance relationships to know which fields are
>> actually on this model, and which are inherited from a base class.
>>
>> This yields the following formal API for _meta:
>>
>>  * get_fields(data, many_to_many, related, include_hidden,
>> include_parents)
>>
>>  * @property data_fields (=> get_fields(data=True, many_to_many=False,
>> related=False, include_hidden=False, include_parents=True)
>>
>>  * @property many_to_many_fields (=> get_fields(data=False,
>> many_to_many=True, related=False, include_hidden=False,
>> include_parents=True)
>>
>>  * @property related_objects (=> get_fields(data=False,
>> many_to_many=False, related=True, include_hidden=False,
>> include_parents=True)
>>
>> Does this sound any more sane as an API?
>>
>
> Yes, with the cave-eat that for example model initialization fields
> through *args do not map to this API, at least not after foreign key split
> to virtual field + concrete fields. Similar for editable fields. So, +1 if
> we also consider defining and documenting an useful set of field flags.
>

Sure - and the purpose of this thread is to tease out what those "useful"
flags are. At the moment, it's not clear to me where exactly the conceptual
holes lie from your perspective. As I said, the API I've proposed here is
sufficient to meet all *current* use cases in the code base.

As best as I can make out, it appears you see a problem with the concept of
"hidden" - because in various circumstances, different fields will be
"hidden" in different ways (especially in the composite/virtual foreign key
future). Taking that "future virtualised foreign key" case - if you're
dealing with the database, it's the virtual field that needs to be hidden,
because the database only cares about fields with an actual column/table
underneath it; but if you're dealing with a form, you don't want the field
for the underlying field, you want the virtual field. However, given a
virtual field representation, I imagine it is possible to get back to the
field (or fields) that hold the underlying representation; all that is
important is that you can iterate over a list of "fields", and from there,
determine a list of column names. The fact that the column name comes from
a different underlying column isn't important; what's important is that the
"foreign key" is only counted once in the introspection process.

So - what I really need here is a counterproposal from someone familiar
with the composite key work. I'm not bound to any of the details of the
proposal I've given here - I'm just relating the end point of work from
SoC. It works with the current use cases exposed by Django, but when this
hits master, we're going to need to live with it long term, so I want to
make sure we're not boxing ourselves into a corner, or introducing
categorisations that aren't representative.

I wonder if a better name for the related category exists. My first
> instinct is that foreign key fields should match the related flag. Could it
> be made cleaner that these are relations defined on remote model? Maybe
> just remote_relations could work?
>

I think the first step is to work out what the buckets/flags are - once
we've got a clear picture of what they represent, naming discussions will
make a lot more sense, since we will know what it is we're actually trying
to name.

Russ %-)

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAJxq848nvjzKc6Sn91nTeFuEYUyNReDLWxf4Qcff%2BSWkVT%3DbKg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: GSoC Meta refactor: Bikeshedding time!!

Reply via email to