On Sat, Apr 20, 2013 at 3:02 AM, Mark Wynter <[email protected]> wrote: > Thanks Markus. > Upgraded to GRASS 7, and re-ran v.clean on same OSM Australia dataset. > Substantially faster. The bulk of the time related to removal of duplicates, > and it got exponentially slower as the process approached 100%. Overall it > took 12 hours but I'm wondering how it would perform if we were to repeat > v.clean for even larger road networks e.g. USA or Europe?
Something is wrong there. Your dataset has 971074 roads, I tested with an OSM dataset with 2645287 roads, 2.7 times as many as in your dataset. Cleaning these 2645287 lines took me less than 15 minutes. I suspect a slow database backend (dbf). Try to use sqlite as database backend: db.connect driver=sqlite database=$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db Do not substitute the variables. HTH, Markus M > > I'm tempted to try dividing the input dataset into say 4 smaller subregions > (i.e. vector tiles), and then try patching them back. > I read that we will still need to run v.clean over the patched datasets to > remove duplicates. > Since the only duplicates should be nodes along the common tile edges, is > there a way to in effect constrain the v.clean process to slithers containing > the common edges? > I've had a quick go at g.region but to no avail. > > Thanks > > GRASS 7.0.svn (PERMANENT):/data/grassdata > v.clean input=osm_roads_split > output=osm_roads_split_cleaned tool=break type=line -c > -------------------------------------------------- > Tool: Threshold > Break: 0 > -------------------------------------------------- > Copying vector features... > Copying features... > 100% > Rebuilding parts of topology... > Building topology for vector map <osm_roads_split_cleaned@PERMANENT>... > Registering primitives... > 971074 primitives registered > 13142529 vertices registered > Number of nodes: 1458192 > Number of primitives: 971074 > Number of points: 0 > Number of lines: 971074 > Number of boundaries: 0 > Number of centroids: 0 > Number of areas: - > Number of isles: - > -------------------------------------------------- > Tool: Break lines at intersections > 100% > Tool: Remove duplicates > 100% > -------------------------------------------------- > Rebuilding topology for output vector map... > Building topology for vector map <osm_roads_split_cleaned@PERMANENT>... > Registering primitives... > 2462829 primitives registered > 13322052 vertices registered > Building areas... > 100% > 0 areas built > 0 isles built > Attaching islands... > Attaching centroids... > 100% > Number of nodes: 1819237 > Number of primitives: 2462829 > Number of points: 0 > Number of lines: 2462829 > Number of boundaries: 0 > Number of centroids: 0 > Number of areas: 0 > Number of isles: 0 > > > > > > On 19/04/2013, at 6:07 PM, Markus Metz wrote: > >> On Fri, Apr 19, 2013 at 9:06 AM, Mark Wynter <[email protected]> >> wrote: >>> Hi All, we're looking for ways to speed up the cleaning of a large OSM road >>> network (relating to Australia). We're running on a large Amazon AWS EC2 >>> instance. >>> >>> What we've observed is exponential growth in time taken as number of >>> linestrings increases. >>> >>> This means it's taking about 3 days to clean entire network. >>> >>> We were wondering if we were to split the dataset into say 4 subregions, >>> and clean each separately, is it then possible to patch them back together >>> at the end without having to run v.clean afterwards? We want to be able to >>> run v.net over the entire network spanning the subregions. >>> >>> Alternatively, has anyone found a way to speed up v.clean for large network >>> datasets? >> >> Yes, implemented in GRASS 7 ;-) >> >> Also, when breaking lines it is recommended to split the lines first >> in smaller segments with v.split using the vertices option. Then run >> v.clean tool=break. After that, use v.build.polylines to merge lines >> again. Or use in GRASS 7 the -c flag with v.clean tool=break >> type=line. The rmdupl tool is then automatically added, and the >> splitting and merging is done internally. >> >> Markus M > _______________________________________________ grass-user mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-user
