On Wed, Oct 14, 2015 at 9:45 PM, Dylan Beaudette
<dylan.beaude...@gmail.com> wrote:
> On Wed, Oct 14, 2015 at 12:55 PM, Dylan Beaudette
> <dylan.beaude...@gmail.com> wrote:
>> On Wed, Oct 14, 2015 at 10:50 AM, Dylan Beaudette
>> <dylan.beaude...@gmail.com> wrote:
>>> Some additional clues:
>>>
>>> The original stack was 365 maps with 3105 x 7025 cells.
>>>
>>> 1. zooming into a smaller region (30 x 40 cells) and running
>>> t.rast.series 100x resulted in 100 "correct" maps, no errors.
>>>
>>> 2. returning to the full extent and running t.rast.series 30x on the
>>> first 31 maps resulted in 30 "correct" maps, no errors.
>>>
>>> 3. returning to the full extent and running t.rast.series 30x on the
>>> last 31 maps resulted in 30 "correct" maps, no errors
>>>
>>>
>>> So, it seems that t.rast.series (r.series) is throwing an error, or
>>> generating wront output, when when:
>>>
>>> a large set of maps are supplied as input, and, a region that has a
>>> moderate number of total cells.
>>>
>>> Yeah, I know, that isn't very specific. I will try re-compiling with
>>> debugging and no optimization next.
>>>
>>> Dylan
>>>
>>>
>>
>> More data,
>>
>> 1. re-compiled with CFLAGS="-g -Wall":
>>  * Multiple runs of t.rast.series with the full stack (365 maps with
>> 3105 x 7025 cells), no errors.
>>  * each run required about 8.5 minutes to complete
>>
>> 2. re-compiled with  CFLAGS="-O2 -mtune=native -march=native" LDFLAGS="-s":
>>  * 10x tests with full stack, no errors
>>  * each run required about 3.5 minutes
>>
>> 3. re-run original script (see listing below)
>>  * random errors from t.rast.series
>>
>> This doesn't make much sense to me. The only difference between my
>> latest "tests" and the original code is that the input to
>> t.rast.series was static over the course of my "tests", vs. dynamic
>> within the original code (see below). I purposely selected a stack
>> that caused t.rast.series to throw an error for my tests.
>>
>
> OK, this does make sense--t.rast.series (r.series) was not the source
> of the problems. I was able to verify this by running t.univar on the
> output from the previous step:
>
>>   # NOTE: 4 CPUs so that external disk isn't thrashed
>>   gdd_max_C=30
>>   gdd_min_C=10
>>   gdd_base_C=10
>>   t.rast.mapcalc --q --o nprocs=4 input=tmin_subset,tmax_subset
>> output=gdd basename=gdd expr="max(((min(tmax_subset, $gdd_max_C) +
>> max(tmin_subset, $gdd_min_C)) / 2.0) - $gdd_base_C, 0)"
>
> ... which means that t.rast.mapcalc was generating one (or more)
> outputs with some kind of problem, which was then causing t.univar and
> t.rast.series to fail.

I can now verify that t.rast.mapcalc is creating some raster maps with
corrupt (?) data. Corrupt in the sense that subsequent reading of the
maps results in the "Error reading raster data for row ..." error.
Just in case anyone is interested, I have opened a ticket for more
informative errors raised by lib/raster/get_row.c
(https://trac.osgeo.org/grass/ticket/2762).

As previously reported, errors seem to occur about 50-60% of the time
and _do not_ appear to be related to the number of concurrent
t.rast.mapcalc instances.

After some more testing, I have found that t.rast.mapcalc does not
(randomly) generate corrupt maps when the output from the mapcalc
expression results in a CELL type map. Expressions that result in both
FCELL and DCELL seem to trigger the corruption.

Fortunately my current project isn't too discriminating and is fine
with CELL output from t.rast.mapcalc.

I now suspect that this is an overflow issue in t.rast.mapcalc (well
the library functions that it calls) that may or may not be influenced
by the use of files linked via r.external.


> The inputs to t.rast.mapcalc are files that have been registered with
> r.external. I suspect that the multiple concurrent r.mapcalc instances
> may be to blame. I don't have an explanation other than some evidence
> from the last time I encountered this type of issue. The workflow then
> was :
>
> 1. spawn 8 concurrent processes via backgrounding: r.sun -> r.mapcalc
>
> 2. when finished with daily solar models, sum maps with r.series
>
> I would occasionally encounter the "Error reading raster data for row
> xxx" error from r.series in this case and assume that r.series had
> somehow broken the map in question.
>
> It would seem that concurrent use of r.mapcalc may be worth
> investigating... however, it is strange that it only occurs sometimes.

I stand corrected. My previous encounters with the "Error reading
raster data for row ..." error were likely associated with this
related problem, which is now fixed:

http://lists.osgeo.org/pipermail/grass-dev/2015-July/075627.html



> Oddly enough, I didn't have problems with maps generated with the
> following (similar) code:
>
> # spring frost
> # if tmin never drops below 0 before the start of summer, then the
> last "spring frost" is on day 0
> # NOTE: 2 CPUs so that disk isn't thrashed
> t.rast.mapcalc --o -n nprocs=2 input=tmin output=spring_frost
> basename=spring_frost \
> expr="if(start_doy() < 182, if(tmin < 0, start_doy(), 0), null())"
>
> # fall frost
> # NOTE: 2 CPUs so that disk isn't thrashed
> t.rast.mapcalc --o -n nprocs=2 input=tmin output=fall_frost
> basename=fall_frost \
> expr="if(start_doy() > 213, if(tmin < 0, start_doy(), 365), null())"


... Not so odd anymore, as these t.rast.mapcalc expressions always
resulted in CELL maps.



Dylan
_______________________________________________
grass-dev mailing list
grass-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-dev

Reply via email to