On Mon, Feb 9, 2015 at 4:52 PM, Fábio Dias <[email protected]> wrote:
> I switched to postgis for data storage and the v.generalize time went > down to 130ish minutes, all processes working in parallel. > > I'm happy now :) thanks you guys very much. > Thanks for reporting this back. What about a blog post, or something like that, on this topic? I believe there is a lot of people interested in some benchmarks. Vaclav -=--=-=- > Fábio Augusto Salve Dias > ICMC - USP > http://sites.google.com/site/fabiodias/ > > > On Tue, Jan 27, 2015 at 8:50 PM, Fábio Dias <[email protected]> wrote: > > Hi, > > > > I've kept an iotop, cumulative, running while the generalization run. > > No disk IO involved, just a couple of postgre stats. I believe the OS > > is keeping everything in RAM cache. I don't believe the disk is a > > bottleneck either, it is a 10 disk raid of 15k rpm disks, it's really > > fast. > > > > I interrupted the processing, moved everything into postgres and > > started over. I'm still loading the shapefiles (that I'm doing one at > > a time), I'll start the 15 processes as soon as it is loaded. As soon > > as that stabilizes, I'll report back. > > > > > > On a related note, wouldn't it be interesting to always try to > > simplify a line? As I understood the code, if the simplification fails > > for any reason, the original line is used. Might be interesting to > > have an option that makes the algorithm retry that line, with half the > > threshold, for instance. It's kind of weird for me to see one side of > > something really simplified while the other side really complicated :) > > > > F > > -=--=-=- > > Fábio Augusto Salve Dias > > ICMC - USP > > http://sites.google.com/site/fabiodias/ > > > > > > On Tue, Jan 27, 2015 at 7:56 PM, Markus Metz > > <[email protected]> wrote: > >> On Mon, Jan 26, 2015 at 3:54 PM, Fábio Dias <[email protected]> > wrote: > >>> Hi, > >>> > >>> The machine has 128Gb of ram. Doesn't matter what I do, I can't make a > >>> dent on it. Even with everything cached in ram (shp files, database, > >>> the whole lot), there is still free memory. > >> > >> OK, it's not RAM. > >> > >>> > >>> I'm asking about the database because the behavior I'm seeing on 'top' > >>> looks like the one you get when mutexes are involved. The processes > >>> don't go all to 100% processing at same time (and the machine has 64 > >>> processors, so no dent there either), except for the v.in.ogr. > >> > >> The v.generailze processes should be at 100% while generalizing, > >> unless the disk can not keep up with multiple simultaneous IO > >> requests. The tables are copied only after the generalization finished > >> (100% reached). > >> > >>> What it > >>> looks like is that something is locking each process and they are > >>> taking turns. Considering how 'lite' the sqlite appears to be, and the > >>> weird locking behavior that was mentioned in other threads (I'm not > >>> getting the locked message here... I did, when I was running 2 > >>> parallel v.in.ogr), isn't it likely to be the culprit? Should I change > >>> it to a more 'non-lite' system, like postgres for instance? > >> > >> That could make sense when running multiple processes in parallel. An > >> alternative would be to create a separate mapset for each process and > >> at the end copy the results back to the main mapset. > >> > >> Technically, it is not possible that the new v.generalize version in > >> trunk (G71) is slower than the old version as in relbr70 because the > >> new version updates only those parts of the topology that actually get > >> changed. The old version also updates components that do not get > >> changed, this is quite time-consuming. > >> > >> I understand you like to go for the big nail immediately, but maybe it > >> is worth testing first on a smaller sample? > >> > >> Markus M > >> > >>> > >>> F > >>> -=--=-=- > >>> Fábio Augusto Salve Dias > >>> ICMC - USP > >>> http://sites.google.com/site/fabiodias/ > >>> > >>> > >>> On Mon, Jan 26, 2015 at 7:22 AM, Markus Metz > >>> <[email protected]> wrote: > >>>> On Mon, Jan 26, 2015 at 9:30 AM, Markus Metz > >>>> <[email protected]> wrote: > >>>>> On Sun, Jan 25, 2015 at 6:11 PM, Fábio Dias <[email protected]> > wrote: > >>>>>> Hi, > >>>>>> > >>>>>> Running r64249, with a couple of stuff in parallel using &. It seems > >>>>>> to be considerably slower. > >>>>> > >>>>> Very strange. Are you using trunk or GRASS 7.0? > >>>> > >>>> Here, v.generalize on a TerraClass tile is down from 25 minutes to 13 > seconds. > >>>> > >>>>> > >>>>>> More than 100h, no 1% printed. To be fair, > >>>>>> I'm not entirely sure I'll see it when it prints, 10 v.generalize > >>>>>> running (5 for each year) + 1 v.in.ogr for 2012. That v.in.ogr is > >>>>>> running for almost 100h too. I'm loading the shps directly, as > advised > >>>>>> way, way back in this thread. > >>>>> > >>>>> What exactly do you mean with "loading shps directly"? For > >>>>> v.generalize, you should import them with v.in.ogr. > >>>>> > >>>>> What about memory consumption on your system? With 10 v.generalize + > 1 > >>>>> v.in.ogr on such a big dataset, quite a lot of memory would be used. > >>>>> > >>>>> Markus M > >>>>> > >>>>>> > >>>>>> AFAIK, no disk is been used, the whole thing is cached (after more > >>>>>> than 24h processing, cumulative iotop shows only a few mb > >>>>>> written/read). I'm no longer using a ramdisk for the grassdata dir. > >>>>>> > >>>>>> However, it appears to be considerably slower, probably because of > the > >>>>>> parallel running jobs. > >>>>>> > >>>>>> My question then would be, considering the thread I saw about > sqlite, > >>>>>> should I be using something else as backend? When it starts to make > >>>>>> sense to change it? > >>>>>> > >>>>>> F > >>>>>> > >>>>>> -=--=-=- > >>>>>> Fábio Augusto Salve Dias > >>>>>> ICMC - USP > >>>>>> http://sites.google.com/site/fabiodias/ > >>>>>> > >>>>>> > >>>>>> On Wed, Jan 14, 2015 at 1:06 PM, Markus Neteler <[email protected]> > wrote: > >>>>>>> On Wed, Jan 14, 2015 at 3:54 PM, Fábio Dias <[email protected]> > wrote: > >>>>>>> ... > >>>>>>>> What would be the best way to do that in parallel? One mapset for > each > >>>>>>>> year? Can I run multiple v.generalizes on the same input with > >>>>>>>> different outputs? > >>>>>>> > >>>>>>> Yes sure. > >>>>>>> > >>>>>>>> My first thought was to run completely separated grass processes > for > >>>>>>>> each simplification, but I didn't find a way to make it search > >>>>>>>> something different than .grass / .grass70 for the configuration > >>>>>>>> stuff.... > >>>>>>> > >>>>>>> Maybe take a look at this approach > >>>>>>> http://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Grid_Engine > >>>>>>> > >>>>>>> but even sending different v.generalize jobs to background (&) > should > >>>>>>> work if you have enough RAM. > >>>>>>> > >>>>>>> markusN > _______________________________________________ > grass-user mailing list > [email protected] > http://lists.osgeo.org/mailman/listinfo/grass-user >
_______________________________________________ grass-user mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-user
