* Markus Neteler <[email protected]> [2018-02-24 23:00:32 +0100]:

On Sat, Feb 24, 2018 at 10:31 PM, Markus Metz
<[email protected]> wrote:
On Sat, Feb 24, 2018 at 10:06 PM, Nikos Alexandris <[email protected]>
* Markus Metz <[email protected]> [2018-02-24 21:39:40 +0100]:
On Sat, Feb 24, 2018 at 9:25 PM, Nikos Alexandris
<[email protected]>
...
Looking at jobs logs, I read lots of ".gislock" lines.
It might be some permission related issue. I partially operated directly
(with my user-id) on many Locations.

The operateor of the scheduler, has naturally, another user-id. I wonder
if I should apply GRASS_SKIP_MAPSET_OWNER_CHECK=1 everywhere.

No, you need to run each process in a unique temporary mapset.

Yes, only that works. Be sure to have a sufficiently long random
string to be used as temporary mapset name.

You can for example the outout of
mktemp --dry-run

and add the machine name to it (and maybe the current time stamp
cleaned for special chars) to avoid race conditions if you use a
shared network storage.

Thanks M1/2.


Once you have
the final result, change the current mapset with g.mapset to the common
mapset where final results should stored and copy the final result from the
temporary mapset to the current mapset (the mapset to hold the final
results).

(we have processed terabytes of LST data like this :-)

Just a pseudo-example: it would suffice then to,

save current region
for loop over something
   ...

   CURRENT_MAPSET=$(g.mapset -p)

   # a temporary Mapset
   RANDOM_STRING=$(mktemp --dry-run |cut -d"." -f2)
   grass -c $RANDOM_STRING

   # do something
   r.mask vector=VectorMap where="Attribute='Here'" &&
   g.region zoom=MASK &&
   r.zonal.stats cover=covermap base=basemap method=average output=outputmap

   # back to "valid" Mapset
   g.mapset $CURRENT_MAPSET

   g.copy raster=outputmap@${RANDOM_STRING},outputmap
   r.stats -acp in=outputmap out=report
   r.mask -r

   ...
   sleep 1
done
restore region

?

Alternatively/additionally, don't use the script grassXY to start a GRASS
session, instead define the GRASS environment with custom scripts (one for
the GRASS version to use, one for the database/location/mapset to use). This
avoids race conditions on a HPC system. A unique temporary mapset for each
process helps to avoid all sorts of concurrent access problems.

It mostly works for me with --exec. Mostly. That is, there are missing
or empty WIND files, here and there, and .gislock related issues.

Markus M

Let's expand this Wiki section a bit with our findings (I'll try to
find my notes):
https://grasswiki.osgeo.org/wiki/Parallel_GRASS_jobs#Cluster_and_Grid_computing

markusN

Attachment: signature.asc
Description: PGP signature

_______________________________________________
grass-user mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/grass-user

Reply via email to