On Sat, Apr 20, 2013 at 3:02 AM, Mark Wynter <[email protected]> wrote:
> Thanks Markus.
> Upgraded to GRASS 7, and re-ran v.clean on same OSM Australia dataset.
> Substantially faster.  The bulk of the time related to removal of duplicates, 
> and it got exponentially slower as the process approached 100%.  Overall it 
> took 12 hours but I'm wondering how it would perform if we were to repeat 
> v.clean for even larger road networks e.g. USA or Europe?

Something is wrong there. Your dataset has 971074 roads, I tested with
an OSM dataset with 2645287 roads, 2.7 times as many as in your
dataset. Cleaning these 2645287 lines took me less than 15 minutes. I
suspect a slow database backend (dbf). Try to use sqlite as database
backend:

db.connect driver=sqlite
database=$GISDBASE/$LOCATION_NAME/$MAPSET/sqlite/sqlite.db

Do not substitute the variables.

HTH,

Markus M

>
> I'm tempted to try dividing the input dataset into say 4 smaller subregions 
> (i.e. vector tiles), and then try patching them back.
> I read that we will still need to run v.clean over the patched datasets to 
> remove duplicates.
> Since the only duplicates should be nodes along the common tile edges, is 
> there a way to in effect constrain the v.clean process to slithers containing 
> the common edges?
> I've had a quick go at g.region but to no avail.
>
> Thanks
>
> GRASS 7.0.svn (PERMANENT):/data/grassdata > v.clean input=osm_roads_split 
> output=osm_roads_split_cleaned tool=break type=line -c
> --------------------------------------------------
> Tool: Threshold
> Break: 0
> --------------------------------------------------
> Copying vector features...
> Copying features...
>  100%
> Rebuilding parts of topology...
> Building topology for vector map <osm_roads_split_cleaned@PERMANENT>...
> Registering primitives...
> 971074 primitives registered
> 13142529 vertices registered
> Number of nodes: 1458192
> Number of primitives: 971074
> Number of points: 0
> Number of lines: 971074
> Number of boundaries: 0
> Number of centroids: 0
> Number of areas: -
> Number of isles: -
> --------------------------------------------------
> Tool: Break lines at intersections
>  100%
> Tool: Remove duplicates
>  100%
> --------------------------------------------------
> Rebuilding topology for output vector map...
> Building topology for vector map <osm_roads_split_cleaned@PERMANENT>...
> Registering primitives...
> 2462829 primitives registered
> 13322052 vertices registered
> Building areas...
>  100%
> 0 areas built
> 0 isles built
> Attaching islands...
> Attaching centroids...
>  100%
> Number of nodes: 1819237
> Number of primitives: 2462829
> Number of points: 0
> Number of lines: 2462829
> Number of boundaries: 0
> Number of centroids: 0
> Number of areas: 0
> Number of isles: 0
>
>
>
>
>
> On 19/04/2013, at 6:07 PM, Markus Metz wrote:
>
>> On Fri, Apr 19, 2013 at 9:06 AM, Mark Wynter <[email protected]> 
>> wrote:
>>> Hi All, we're looking for ways to speed up the cleaning of a large OSM road 
>>> network (relating to Australia).  We're running on a large Amazon AWS EC2 
>>> instance.
>>>
>>> What we've observed is exponential growth in time taken as number of 
>>> linestrings increases.
>>>
>>> This means it's taking about 3 days to clean entire network.
>>>
>>> We were wondering if we were to split the dataset into say 4 subregions, 
>>> and clean each separately, is it then possible to patch them back together 
>>> at the end without having to run v.clean afterwards?  We want to be able to 
>>> run v.net over the entire network spanning the subregions.
>>>
>>> Alternatively, has anyone found a way to speed up v.clean for large network 
>>> datasets?
>>
>> Yes, implemented in GRASS 7 ;-)
>>
>> Also, when breaking lines it is recommended to split the lines first
>> in smaller segments with v.split using the vertices option. Then run
>> v.clean tool=break. After that, use v.build.polylines to merge lines
>> again. Or use in GRASS 7 the -c flag with v.clean tool=break
>> type=line. The rmdupl tool is then automatically added, and the
>> splitting and merging is done internally.
>>
>> Markus M
>
_______________________________________________
grass-user mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-user

Reply via email to