On Sun, 2007-04-22 at 06:56 +0000, Collin Grady wrote:
> On Apr 21, 9:53 pm, Malcolm Tredinnick <[EMAIL PROTECTED]>
> wrote:
> > Your database may still have a serial value
> > on that field, for example, or there may be a trigger that does the
> > value computation. Both of these are a little unrealistic, but not out
> > of the bounds of possibility (because people do some pretty weird stuff
> > with databases and then complain when you take away their toys).
> 
> In those instances though, wouldn't you want to leave the primary key
> field out of the insert? Or do non-specified fields get left out no
> matter what since there's no data there?
> 
> > Validation is not the save() method's responsibility because it's not
> > always obvious what the right answer is. Took me a while to wrap my
> > brain around that when I starting working in that part of the code, but
> > it's an idea that works quite well when we apply it consistently.
> 
> I'm probably missing something simple, but I don't see how validation
> ties into this - I'm not suggesting it needs to make sure there are
> valid values set in anything, it just seems that it's using the wrong
> metric to determine if a field is a primary key or not :)

Ah, okay .. I thought you were worried that it wasn't checking the
existence of something.

The code uses different checks because it's looking for different
things. In the second half of Model.save(), the code doesn't care if the
field is a primary key or not. It's already establish the record doesn't
exist and it needs to do an insert and just wants to insert every field
that isn't the serial field we might be using for the primary key (the
latter being updated by the database automatically). It's only in the
first half of that function that we care about the difference between
primary and non-primary keys because the code is trying to work out if
the record already exists in the database.

> > There's going to be some interesting validation issues in that case.
> > You'll be able to save a model with only some of the fields set
> > correctly. I'm not sure that's a good idea. It's also a mark against the
> > whole concept, in some respects.
> 
> Wouldn't it be fine though if the fields you leave out have valid
> defaults in the database?

Yes, it would work. Why is it necessary to introduce the complexity (and
risk of screwing it up), though? We have the default attribute on models
for this type of thing. You're really leaking abstractions here.
Conceptually, Django works with Python models that just happen to be
stored (sometimes) in an SQL database. You are trying to bring the SQL
database more into the foreground here (which is admittedly something
that happens elsewhere too -- not ideal in any of those cases). I'm not
sure that you're solving a problem that needs solving here.

> You'd have to be careful not to leave out a field you really want to
> save, but that's not much different than remembering to set a field in
> the first place :)
> 
> > It's not universally true that not updating the same fields won't lead
> > to problems, because the two functions could have overlapping WHERE
> > clause constraints (or the WHERE clause of one could overlap the updated
> > columns of the other). So that part of the justification doesn't support
> > the inclusion of the patch. None of the databases we support widely have
> > column-level locking, as far as I'm aware (it wouldn't make a lot of
> > sense under the covers).
> 
> While not universal, I think there are enough cases where it does work
> to make it a valid consideration, even if it isn't a full
> justification on its own :)
> 
> > What speed-ups are you seeing on what sort of
> > use-cases with this change?
> 
> I haven't done any testing yet - the performance comment was based on
> things others have said to me on how SELECTs involving columns with a
> lot of data in them can be slower, so I figured that UPDATEs with
> large blocks of text could be similarly impaired - if that's
> incorrect, then scratch the performance idea :)

As an aside, the information about SELECT statements is not really
accurate. In a moderately complex query, filtering the data is a much
larger component of the time budget than reading the results from disk,
regardless of the columns retrieved. One reason for this is that the
database doesn't just read the columns you want from disk. It has to
read in multiples of some contiguous block size (typically 8K or more),
so it will read a large portion of the contents of each row whenever you
access any random selection of data in the row -- ergo reading the data
for a table is relatively constant and working out which rows to
actually read is more important (disk access is sloooowww). You *can*
blow this rule out of the water by using horribly complex computed
columns or trivial constraints, but the first case is rare, particularly
in Django scenarios, and the second case is so fast anyway that the
proportion of the time budget spent on each part is effectively
irrelevant.

I don't have any similar information for updates, although I suspect
there's going to be similar conditions at play. Particularly because the
Django usage patterns will involve updating only a single row in each
table at a time -- so there aren't any complex constraints used to
filter the rows -- which again falls into the situation where the total
elapsed time is so small as not be worth micro-optimising. Timing the
performance will be necessary.

Regards,
Malcolm


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to