GSoC Meta refactor: Bikeshedding time!!

Russell Keith-Magee Fri, 15 Aug 2014 18:38:27 -0700

Hi all,

tl;dr - Daniel's GSoC is coming to a close; we need some help verifying
that we've got our taxonomy correct and API names that make sense.

The long version:

Daniel Pyrathon has been making great progress with his GSoC project to
refactor and formalise the _meta object on Django models. [1]

For those who haven't been following along, the project aims to finally
document the interface provided by _meta - in particular, the API methods
that let you introspect the fields and relations that exist on a model.
Despite it's importance to tools like admin, t's never been a formally
listed as a stable API. And over the last 8 years, _meta has also picked up
lots of cruft, so there are a few messy internal pieces, duplicated
functionality, and so on.

Aside from the benefit of cleaning up documenting a core piece of Django,
this project has the side effect of providing an interface against which
others can develop - which means it is possible to develop backends for
other data stores that are "Django compliant". Daniel has already
demonstrated this with a theoretical "Email Model" wrapper around Google's
Gmail API [2]. This is model that has no connection to Django's Model base
class, but quacks enough like a Django model that it can be used in
Django's admin, with Django ModelForms, etc. With a little bit of effort,
it is now conceivable that SQLAlchemy, MongoDB, and many other data stores
could be exposed in a way that they can be viewed in Django's admin, and
modified using Django's forms.

We're probably not going to get his refactor committed by the formal end of
the GSoC, but we're getting close, and Daniel has said he's interested in
continuing beyond the end of the formal GSoC period to get the PR to
completion. A huge Thank You goes to Daniel for his excellent efforts over
the last few months.

However, now we're at the pointy end, and that means some bike shedding.

At the core of the _meta API is a set of methods to retrieve the fields on
a model - given a model, _meta allows us to query that model and discover
all the fields and relations that are associated with that model. Django
has lots of different field types, but we've never really formalised
nomenclature for some of them.

After a SoC worth of discussion, Daniel, myself, and various contributors
on IRC and Github have ended up with the following taxonomy:

a) "Pure" data fields - things like CharField, IntegerField, etc. Fields
that manifest as a single column on the model's underlying table.

b) "Relating" data fields - This means ForeignKey. Fields that manifest as
a single column, but represent a relation to another model.

c) "Pure" external fields - Fields that manifest as an external table.
Django doesn't really have any examples of these at present. Conceptually,
something like a "document" field type in a document-based store might fall
into this category.

d) "Relating" external fields - This means ManyToMany fields. Fields that
are manifested as an external table, but represent a relation to a
different model.

e) "Pure" virtual fields - Fields that are conceptual wrappers around other
fields on the same model. Virtual fields don't have column representations
themselves; they are a wrapper around columns provided by other fields.
Again, Django doesn't have any of these at present, but it's easy to think
of examples of "virtual" fields like Point (a wrapper around an X and Y
field). Composite fields would probably fall into this group.

f) "Relating" virtual fields - Fields that are conceptual wrappers around
other fields on the same model that represent a relation to another model.
Generic Foreign Key is the example here.

g) Related objects - The reverse side of (b) - a field representing all the
objects that are related to this model in a singular relation (i.e., the
reverse of a FK)

h) Related ManyToMany - The reverse side of (d) - a field representing all
the objects that are related to this model in a multiple relation. (i.e.,
the reverse of a M2M)

i) Related Virtual - The reverse side of (f) - a field representing all the
objects that are related to this model through a virtual relation (i.e., a
GenericRelation)

So - firstly, we need a sanity check. Does this taxonomy capture all field
types that you can think of? Are there any interpretations of composite
fields, or any other esoteric field type (existing or imagined) that don't
fit in this taxonomy?

Secondly - the hard part - naming. The current API (i.e., Django 1.6 API)
is a little confused in relation to this nomenclature:

a) fields
b) No specific name; included in "fields"
c) No matching field type
d) many_to_many
e) No matching field type\
f) virtual_fields (even though it's a relating type)
g) get_all_related_objects()
h) get_all_related_many_to_many_objects()
i) Included in virtual_fields (even though it's a reverse type)

So there's variation on whether to have a _fields suffix, whether to use
m2m or many_to_many, whether to use attributes or methods, and on basic
classification of some field types.

Here's my suggestion for a normalised set of names that match the taxonomy:

a) data_fields
b) foreign_key_fields
c) external_data_fields
d) many_to_many_fields
e) virtual_data_fields
f) virtual_foreign_key_fields
g) related_data_objects
h) related_many_to_many_objects
i) related_virtual_objects

These would all be exposed in two ways:

* As properties (so, MyModel._meta.data_fields would be a list of type (a)
fields on a model), and

* As keyword arguments to get_fields(), by dropping the last part of the
name (i.e., MyModel._meta.get_fields(data=True), which would return the
same result as MyModel._meta.data_fields).

I'm obviously preferring the long-form name here (many_to_many vs m2m,
etc). This is obviously one point for discussion, where we seek opinions.
Any other naming suggestions are also welcome.

The final form of the formal API would then be:

* get_field(name) returns the field with a given name.

* get_fields(data=True, foreign_key=True, ...), returns all the fields
that match the given flags

* A set of optimised and cached properties - data_fields,
foreign_key_fields, etc; essentially cached wrappers around calls to
get_fields() with specific flags enabled.

This API replaces all the other field-related methods on _meta, including
get_all_related_objects(), get_concrete_fields_with_model(),
get_m2m_with_model(), and so on. Daniel's branch demonstrates that the most
of these extra methods don't provide any performance benefit; they just
complicate the internals.

The "old" API methods can also be entirely implemented using calls to the
"new" API; so there's a fully backwards-compatible path for introducing the
new API.

Comments welcome. Obviously, this has enormous potential to devolve into
bike shedding, so I'd appreciate it if people kept that in mind. If you
have a preference for something like short vs long form names, feel free to
state it, but please don't let this devolve into arguments over the
relative merits of pith over verbosity in API naming. It's much more
important that we clarify the matters of substance - i.e., that we have a
complete and correct taxonomy - not that we fixate on the names themselves.

[1] https://github.com/django/django/pull/2894
[2] https://github.com/PirosB3/django-mailer/

Yours,
Russ Magee %-)

--
You received this message because you are subscribed to the Google Groups
"Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-developers/CAJxq849GwvkbKQ-KSOuGuVk5Lm7PA%3D4JsQw8ktANskYrfW7iew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

GSoC Meta refactor: Bikeshedding time!!

Reply via email to