Re: GSoC: Data importation class

Malcolm Tredinnick Thu, 25 Mar 2010 09:33:25 -0700

Hi,

I see a few problems here. The gist of what follows is that it seems a
bit abstract and as one tries to nail down the specifics it either
devolves to a more-or-less already solved problem that doesn't require
Django core changes, or a problem that is so unconstrained as to not be
solvable by a framework (requiring, instead the full power of Python,
which is already in the developer's hands).

On Thu, 2010-03-25 at 09:07 -0700, subs...@gmail.com wrote:
> If we could visualize the entirety of data within django-projects, we
> would probably see that this 'data economy' is growing exponentially
> year-over-year. However, I know of no guided way to actually get this
> data into a project that's been converted to Django. There are two
> methods I generally hear about when asking people how to move between
> schemas: purely SQL solutions and one-shot scripted solutions (or a
> mix). With talk of model-level validation, the first approach is
> becoming increasingly invalid,

That's not a correct statement, since Django models can often be used to
proscribe conditions on new data that is created via the web app, yet
those conditions might not be required for the universal set of data
that already exists. For example, webapp-generated data might always
require a particular field, such as the creating user, to be filled in,
whilst machine-generated data would not require that. Don't equate
validation conditions at the model level with constraints on the data at
the storage level.

>  but I wonder if we could include some
> batteries for the second approach?
> 
> My proposal is a new django class which provides a mapping for how
> this data should move from its legacy schema to a django project. 
> I've
> got a sort of proof-of-concept already working but it lacks the polish
> of a refined community contribution. Moreover, with multi-database
> support coming, I see this concept getting a shot in the arm,
> especially in cases where the legacy db is a currently supported one.
> 
> I imagine the usage going something like:
> 
> 1) User creates django project
> 
> 2) User runs a 'startconversion' app which creates a stage folder for
> holding an inspectdb of the legacy data, a default router for the
> legacy data, and some other empty files.

The last bit sounds a bit nebulous. You could optimise it by not
including any empty files, or be a bit more specific about what the
empty files are meant to represent. :)

> 
> 3) User defines the classes which defines the map between the legacy
> and new schema, and defines clean functions according to their needs,
> 'foreignkeys' to other conversion classes, etc.

It seems that you are talking about the cases where, by default, a
different schema is required. The first approach is to make the models
match the existing schema, on the grounds that the existing schema is a
reasonable representation of the data. In the case where that isn't
true, a migration is required, but the possibilities for such migrations
are endless unless the original data can already be put into natural
Django models. If inspectdb can already be run on the existing data, why
not use that as the starting point and then the dev can use something
like South to migrate to their schema of choice? It seems that we
already have all the tools in that case.

If inspectdb cannot generate a useful schema that can be modelled by
Django, the user is going to have write a generic Python script in any
case and the possibilities there are boundless and best left to the best
tool available for the job at hand: Python itself.

> 4) User runs a command at the top branch of their schema (some distant
> relation) and the command inspects these classes and runs them from
> the ground up. As it does this measures are taken (such as use of
> pagination) to avoid server CPU/memory thrashing, as well as model-
> level measures such as OneToOne's being respect, etc.

Adding system administration functionality to Django, which is what this
monitoring is, feels like the wrong approach. It's not intended to
replace everything else in your computing life. What is appropriate load
usage for one case will be highly inappropriate elsewhere. How will you
detect what you are labelling as "thrashing"?

Regards,
Malcolm

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-develop...@googlegroups.com.
To unsubscribe from this group, send email to 
django-developers+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en.

Re: GSoC: Data importation class

Reply via email to