Brian Olson wrote:
I've updated my redistricting site ( http://bolson.org/dist/ ) to include the racial breakdown of all current congressional districts (sometimes interesting by itself) and that of the compactness based districts I have come up with. If you want you can jump directly to http://bolson.org/dist/ XX where XX is any US state abbreviation.
As I've mentioned before, I've been doing something like this, but out of no other purpose than abstract interest, and applied to the world at large.
Thus, I think I can give some ideas as to how to improve it. The first and most immediate one is to use K-means++ clustering. For my own simulator, this immediately gave an order of magnitude standard deviation improvement. See http://en.wikipedia.org/wiki/K-means%2B%2B for that. Basically, K-means++ picks the first center randomly. Then it picks the next center with a probability for each point proportional to the distance from the closest existing center, times a power (originally squared, but my program uses 0.25). Set the power to whatever gives the best results. The simplest way of picking a new center is to just record the distance to closest centers (you have to do that anyway to draw the map), then use roulette selection for the probabilities.
Second, by recording distances in such a manner, further optimization is possible. When adding a new district, you can simply grow outwards from its centroid (center point) without having to recalculate all the other districts. This kind of heap optimization enables my "global redistricter" to work with distances based on elevation deltas (a rough approximation to travel time). Of course, if you're using plain Euclidean distance, you can just use Fortune's plane sweep algorithm to calculate the Voronoi diagrams.
After all the district centers have been picked, you can't use K-means++ any more. Revert to ordinary K-means clustering at that point (or whatever other clustering you may want). For my program, this means that the K-means++ initial phase can be quicker than the subsequent move-centroid-to-population-center K-means phase, so you might also try running the initial phase for various different random seeds and then only picking the most promising to go onto the next phase.
Another refinement I've been considering, but I haven't implemented, is a way to split districts internally. Usually, most districts ("countries" in my case) are about even, but some have much lower population and some have much greater. Remove one of the lower population districts and use the spare point to split a high population district. One way of doing that would be to turn a point (x, y) (centroid of a populous district) into (x+r, y+r) and (x-r, y-r) for some small value r, then letting k-means clustering draw the centroids apart in the right direction. The split might affect other districts and the direction of the r component might be wrong, but it might also turn out well (and my manual emulation of this does simply because the large districts are so large that it distorts the others less than the split gains).
---- Election-Methods mailing list - see http://electorama.com/em for list info
