Re: GSoC Proposal: Serialization Enhancements

Russ Amos Sun, 29 Mar 2009 16:34:03 -0700

Oliver, examining unique and unique_together attributes is exactly what I
have in mind for attempting to follow relationships when deserializing. I
realize it needs to be dealt with carefully in the inner workings of Django,
to provide consistency, but anything you're willing to provide, I'm willing
to read through and maybe refactor.


So, yeah, I'd like to take a look at what you've written. Pragmatism may not
make for the cleanest code, but I think it's a great way to find "gotchas"
that might otherwise be difficult to unearth.


Russ

On Sun, Mar 29, 2009 at 6:23 PM, Oliver Beattie <oli...@obeattie.com> wrote:

>
> I'll admit that I haven't read your whole post (sorry), but one part
> caught my eye… the bit about storing relationships not just as primary
> keys. If I am right in thinking that you are wanting to do some sort
> of "relationship following" I think I can probably help by providing
> some initial (but a bit messy) code. I've recently run into the issue
> of primary-key-dependant fixtures breaking in interesting ways on a
> real database, so had to implement the relationship following (I have
> a serializer I've named jsonfk). Basically, it will look at unique and
> unique_together fields in models and figure out what it can use,
> falling back to the primary key if either no unique fields exist or
> the lookup fails on deserialization.
>
> Sorry if I'm rambing a bit… it's late, but let me know if this is
> something that may be of interest to see. It's not something I want to
> release publicly since it's a bit of a mess and doesn't handle M2M
> relations (on the project I've used it on M2M relationships aren't
> really used).
>
> On 29 Mar, 22:57, Russ <taal...@gmail.com> wrote:
> > My apologies for the length!
> >
> > Concisely, I intend to provide the Django user with some granular
> > control of the data to be serialized without sacrificing backwards
> > compatibility for old code, or for users who need the straightforward,
> > current functionality.
> >
> > Any serialized Model contains only the data in the source row of the
> > database table.  Regarding inheritance, this is only an issue when the
> > model in question derives from one or more concrete models.  In that
> > case, however, any data contained on the far end of the one-to-one
> > relation is not serialized at all.  The overarching issue is rooted in
> > the wayDjango serializers treat relationships.  This is clearly a
> > complex issue, because both shallow and deep serialization are useful
> > in a large range of situations.
> >
> > For the purpose of argument and demonstration, please examine the
> > following use case.  A business wishes to begin providing their
> > products online, but they wish to continue using the inventory
> > management software with which they are familiar
> > (henceforth,ezInventory).  The models.py looks like:
> >
> > from django.db import models
> >
> > class Product(models.Model):
> >     name = models.CharField(max_length=200)
> >     description = models.TextField()
> >     # We're assuming all prices are in USD...
> >     price = models.DecimalField(max_digits=6, decimal_places=2)
> >
> > class Order(models.Model):
> >     products = models.ManyToManyField(Product)
> >     order_placed = models.DateTimeField()
> >
> >     def total_price(self):
> >         return self.products.aggregate(models.Sum('price'))
> >
> > Django's serialization facilities make importing and exporting
> > products and orders between their website and ezInventory a breeze.
> > Six months later, however, they have a feature request.
> >
> > They would like to begin providing support for some of their products
> > via subscription.  This is an easy addition:
> >
> > # imports...
> > # original models...
> >
> > class Subscription(Product):
> >     recurrence = models.PositiveIntegerField(choices=((12, 'annual'),
> > (1, 'monthly')))
> >
> > Immediately, a problem is evident: the business relies on easy
> > communication of data between ezInventory and the website, but Django
> > serializes the class Subscription as follows:
> >
> > [{
> >     "pk": 13,
> >     "model": "app.subscription",
> >     "fields": {
> >         "recurrence": 12
> >     }
> >
> > }]
> >
> > Now, this is wonderful, as long as we know to pair this information
> > with the product with id 13.  However, ezInventory, though supportive
> > of subscriptions in general, merely overwrites the product with id 13
> > (assuming the products table was imported first) and complains about
> > the missing fields.
> >
> > Skipping merrily back to the real world, here is the issue, plain and
> > simple.  Sometimes, you need to include more information than is just
> > in the one model.  Sometimes, that's going to appear in the form of
> > inherited models, sometimes via one-to-many or many-to-many
> > relationships, and sometimes just little relevant bits of information
> > that may not actually be in the model at all.  There are numerous open
> > tickets about this issue [1][2][3][4].  Clearly, this needs some
> > thought.
> >
> > I like the idea [5] of providing a Serializer class, defined similarly
> > to the ModelAdmin class, to allow custom ways of serializing data,
> > something like:
> >
> > # in serializers.py (or something)
> > class ProductListing(serializers.Serializer):
> >     fields = {
> >         'Product': ['price', 'name', 'description'],
> >         'Subscription': ['price', 'name', 'description', 'recurrence']
> >     }
> >
> > serializers.register(ProductListing)
> >
> > $ python manage.py shell>>> from project.app import models
> > >>> from django.core import serializers
> > >>> print serializers.serialize('json',
> list(models.Product.objects.all()) +
> >
> > ...     list(models.Subscription.objects.all()),
> > ...     serializer='ProductListing', indent=4)
> > [{
> >     "pk": 1,
> >     "model": "app.product",
> >     "fields": {
> >         "price": 9.98,
> >         "name": "Product #1",
> >         "description": "Description for Product #1"
> >     }},
> >
> > # other products...
> > {
> >     "pk": 13,
> >     "model": "app.subscription",
> >     "fields": {
> >         "price": 14.98,
> >         "name": "Subscription #1, Product #13",
> >         "description: "My demonstrations aren't particularly
> > creative.",
> >         "recurrence": 12
> >     }
> >
> > }]
> >
> > Not only is this backwards-compatible (no new serializers == no new
> > behavior), but it also continues to allow deserialization: The
> > deserialization process would see that app.subscription is a child of
> > product, and fill in the data appropriately.  This circumvents one of
> > the largest drawbacks, IMO, ofDjangoFullSerializers [6]; being able to
> > deserialize this data is often just as important as serializing it
> > (lookin ' at you, fixtures).  The observant reader will note that, as
> > thus demonstrated, it only solves this issue for models using
> > inheritance, not those using one-to-many or many-to-many fields.  So,
> > let's serialize some orders.
> >
> > # in serializers.py (or something)
> > class OrderSerializer(serializers.Serializer):
> >     fields = {
> >         # include values from the products, and the return value of
> > total_price
> >         'order': [{'products': ['price', 'name']}, 'total_price']
> >     }
> >
> > serializers.register(OrderSerializer)
> >
> > $ python manage.py shell>>> from project.app import models
> > >>> from django.core import serializers
> > >>> print serializers.serialize('json', models.Order.objects.get(pk=1),
> >
> > ...     serializer='OrderSerializer', indent=4)
> > [{
> >     "pk": 1,
> >     "model": "app.order",
> >     "fields": {
> >         "products": [{
> >             "pk": 1,
> >             "model": "app.product",
> >             "fields": {
> >                 "price": 9.98,
> >                 "name": "Product #1"
> >             }
> >         },
> >         {
> >             "pk": 13,
> >             "model": "app.subscription",
> >             "fields": {
> >                 "price": 14.98,
> >                 "name": "Subscription #1, Product #13"
> >             }
> >         }],
> >         "total_price": 14.96
> >     }
> >
> > }]
> >
> > As long as enough data is presented, the deserializer stands no less
> > chance of accurately connecting models together than in its current
> > form.  Arguments could be provided to search for matching rows and
> > correct the primary keys from serialized data, or to update the fields
> > in the row of given primary key.
> >
> > A couple of notes.  I think that providing 'excludes' behavior could
> > be done by prefixing a field name with a minus sign, i.e.:
> >
> >     fields = {'order': ['-order_placed']}
> >
> > If the only fields listed are prefixed with a minus sign, we can
> > assume that all remaining fields are to be included.  Mixing prefixed
> > and normal fields assumes implicit negations for any fields not
> > mentioned:
> >
> >     fields = {
> >         'order': ['-order_placed', 'total_price']
> >     }
> >
> > serializes only the total_price pseudo-field, ignoring both
> > order_placed, explicitly, and products, implicitly.  This is just
> > personal preference -- the decision needs to be made, and an 'exclude'
> > member to provide explicit field exclusions is just as sensible.
> >
> > Providing a member to allow shortcut fields definition is a good
> > idea.  For example:
> >
> > class OrderSerializer(serializers.Serializer):
> > # similar to QuerySet.select_related('product')...
> >     select_related = ['product']
> > # or, to mirror QuerySet.select_related(depth=1)...
> >     select_related = 1
> >
> > This would use the default behavior, except the 'products' key in the
> > output would be list of the serialized products, which are serialized
> > in the default way.
> >
> > More complex output can be produced by calling one serializer from
> > another:
> >
> > class OrderSerializer(serializers.Serializer):
> >     fields = {
> >         'order': [{'products__via': 'ProductsListing'}, 'total_price']
> >     }
> >
> > The 'via' keyword would cause the serializer to use the serialized
> > output of ProductsListing when called with the particular order's
> > products.all().
> >
> > To be frank, deserialization is made much more complex with the last
> > few examples, and brings us back full circle to the data integrity
> > problems associated withdeserializing relational database data from a
> > flat file.  But, in the context of fixtures, it's typical that the
> > database is empty, and so thedeserializer can just dump all the data
> > in and maintain the relations (as is currently done).  In situations
> > [2] where the data is prepopulated, and pk values for relationships
> > may not be correct, perhaps a loaddata argument could be specified to
> > tell the deserialize to find the correct pk by searching for a row
> > that matches the serialized fields.  For ambiguous cases, it is
> > probably best to use human power instead ofDjango power.
> >
> > Additional issues (some solved above, coincidentally), including the
> > addition of arbitrary fields to serialized data [7], the json
> > serializer's handling of gettext_lazy [8][9], and fields to be ignored
> > by loaddata [10] will also be fixed.
> >
> > NB: Though all of my examples have used JSON, my proposal is of a
> > system that varies independently of serialization format.  I believe
> > that it's important to draw a line between format and that data to be
> > formatted, andDjango already provides the proper facilities to allow
> > custom serialization formats.
> >
> > I believe that my proposal can be implemented in the 13-week GSoC time
> > frame as follows.  I like to work time-boxed, so everything is split
> > as such.  I would rather trim a task or two to be implemented later
> > than  have all tasks for a phase leak over into the next week.
> >
> > Assume that every phase ends with documentation writing, regression
> > test creation/updates, and bug hunting.
> >
> > Prior to 05/23 Discuss and build use cases to demonstrate goals, flesh
> > out API, last-minute project scoping
> > 05/23 - 06/05 Code foundational serializers.Serializer base class,
> > where
> >
> > class ProductSerializer(serializers.Serializer):
> >     pass
> >
> > would act just as the current serializers.serialize(format,
> > Product.objects.all()), but with the new code structure.
> > 06/06 - 06/12 Implement fields attribute (and excludes attribute, or '-
> > field' functionality) in cases where relations are not followed.
> > Includes child models, as in theProductsListing example class above.
> > 06/13 - 06/26 Implement recursive serialization using the "__via"
> > syntax in fields.  Examine whether this format could be used
> > implicitly for deep serialization.
> > 06/27 - 07/04 My extended family will be visiting, so any work this
> > week will reflect that listed for next week.
> > 07/05 - 07/10 Bug hunting, documentation additions, finishing off
> > anything that didn't get finished in an earlier phase, regression
> > testing, write implementations of applicable use cases.
> > 07/11 - 07/24 Provide implicit relation selecting via the fields
> > attribute.
> > 07/25 - 07/31 Add agreed-upon flags to loaddata/dumpdata to allow
> > fixtures users to utilize new features.
> > 08/01 - 08/10 Create documentation patches, bug hunt, improve code
> > quality [http://docs.djangoproject.com/en/dev/internals/contributing/
> > #coding-style], and begin community testing.
> > 08/11 - 08/17 Communicate with community about unearthed bugs and
> > gotchas, and eliminate or prevent those issues.
> >
> > References:
> > [1] =>http://code.djangoproject.com/ticket/4656
> > [2] =>http://code.djangoproject.com/ticket/7052
> > [3] =>http://code.djangoproject.com/ticket/9422
> > [4] =>http://code.djangoproject.com/ticket/10295
> > [5] =>http://code.djangoproject.com/wiki/SummerOfCode2009#Ideas
> > [6] =>http://code.google.com/p/wadofstuff/wiki/DjangoFullSerializers
> > [7] =>http://code.djangoproject.com/ticket/5711
> > [8] =>http://code.djangoproject.com/ticket/5590
> > [9] =>http://docs.djangoproject.com/en/dev/topics/serialization/#id2
> > [10] =>http://code.djangoproject.com/ticket/9279
> >
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: GSoC Proposal: Serialization Enhancements

Reply via email to