On 14 Apr 2014, at 09:50, Dan Burzynski <[email protected]> wrote: > Hi there. I'm just about to (hopefully) write an import script for MapIt to > pull in a load of data from the ONSPD file
A generic import script for importing data linking postcodes with areas that don't have a defined boundary would be good, thanks :-) We have generic import scripts for Areas and Postcodes themselves, but it is missing one that can use the Postcode to Area many-to-many table that exists; only specific scripts like the one you mention currently exist. > (Local Education Authorities being an example of one). Just to point out here that there is no need to import Local Education Authorities if the MapIt installation already imported local councils, because you already have the data. If your point lookup returns a London borough, Metropolitan borough, Unitary authority, County council, or the Isles of Scilly, that is the appropriate LEA also. In Northern Ireland it's slightly different in that there are five Education and Library Boards for the (currently) 26 councils, but the mapping is just a list of councils to each board - you'd be better off doing a simple lookup, I wouldn't think it worth importing a whole giant array of data to add that. (You could potentially use the mapit_import_area_unions script to create the Education and Library boards out of the Northern Ireland councils, as an alternative, though I've not used that script myself, and again I'm not sure it's worth it.) > Is there any instructions/tutorials/examples on 'how to create a new area > type and an import script'? To create a new area type, you can use the admin interface. (Or you could do it as part of an import script of course, but just to note that the admin interface lets you create new things directly also.) An area Type is a basic model with code and description, so in code you would create one with something like Type.objects.get_or_create(code='CODE', description='Description of this type') Writing an import script, no, I'm afraid not, though they don't do anything non-standard, just normal Django/python things to read in data and create objects in the database from that data. So the Django documentation would be what I would point you at first. > I've had a quick peek at ampi_UK_import_nspd_ni.py but I'm still none the > wiser (I'm not a Python programmer which doesn't help) ;-) It also doesn't help that the script uses a manual CSV file to look up the right areas for that specific data :) The existing generic import scripts as I mentioned above are mapit_import and mapit_import_postal_codes which are documented at http://code.mapit.mysociety.org/import/boundaries/ and http://code.mapit.mysociety.org/import/postal-codes/ respectively. http://code.mapit.mysociety.org/how-data-is-stored/ mentions the many-to-many table at the bottom, though that could do with some expanding. You may want to look at the mapit_UK_import_nspd_national_parks management script which doesn't have the Area specific stuff the NI script does, and could be more easily generalised. Something that had a pre_row/post_row (like mapit_import_postal_codes does) so it can be subclassed, and has command line arguments to look up/possibly create the area, would be the most useful, I imagine. So you'd supply a CSV file of postcode/identifiers, options for the column numbers, what type the area is, whether it should be created or not - the mapit_UK_import_nspd_national_parks script could then hopefully be a small subclass of that. The national parks script is doing the following: * handle_label loops through the provided file, and for each row: * It ignores rows we don't care about * It looks up the Postcode * It gets the identifier from column 37, looks up its name, gets or creates an Area, and then adds a link in the many-to-many table from the postcode to that area. And for completeness, the NI script is doing: * handle_label() first uses a manual CSV file to map NI area identifiers for wards, NI Assembly and UK Parliament constituencies to the existing Areas in the database for those areas, then calls process(). (Note that the areas already existed in the database.) * process() opens the provided CSV file, loops through its rows, calling pre_row, handle_row, post_row on each one (the default import script does nothing in pre_row or post_row, and in handle_row checks to see if the postcode already exists and creates/updates as necessary) * pre_row() ignores rows we don't care about, and gets the ONS and Parliament codes for the supplied postcode. It then creates self.areas, the six areas relevant to this postcode from those two codes. * post_row() adds the six areas from pre_row() to the many-many table linking Postcodes and Areas. So the NI script is actually importing the postcodes, hence why it's a subclass and that it's doing slightly more. > It seems that (and I'm kind of guessing here) that there's a function called > handle_label that is where you put the file parsing logic. That script is a Django management command, details of how they're written can be found in the Django documentation: https://docs.djangoproject.com/en/1.6/howto/custom-management-commands/ In this particular case, that script is a subclass of the postcode import management command which is where the main control flow (process/pre_row/handle_row/post_row) is being done and then overridden where necessary, as explained above. > What's the best way to run the system in some kind of debug mode so I can at > least trial and error it without screwing up my database? ;-) What would you want a debug mode to do? Some of the existing management scripts (not the one you mention, though, I'm afraid) run in a "dry run" state by default, not committing to the database unless --commit is provided. That script could be altered to do that, or any new script you write. Or you could use a copy of your database for testing purposes. Hope that's helpful. ATB, Matthew _______________________________________________ developers-public mailing list [email protected] https://secure.mysociety.org/admin/lists/mailman/listinfo/developers-public Unsubscribe: https://secure.mysociety.org/admin/lists/mailman/options/developers-public/archive%40mail-archive.com
