On Thu, Sep 13, 2012 at 9:30 AM, Victor Olaya <[email protected]> wrote:
> Hi all,
>
> I am trying to write down a process and I am following the existing
> algorithms as examples, so as to write mine in a similar way. I have
> noticed the usage of the DecoratingSimpleFeatureCollection to create
> processes in which the result is calculated on the fly as an input
> layer is iterated. Is that recommended for processes that are intense
> in terms of computing? I might not be understanding how it works, but
> it seems that if the result of the process is then iterated several
> times (for instance, if it is used as input in several different
> processes), the calculations will be done each time that happens. Is
> that right?
>
Correct
>
> If so, what would be the way to do it? would it be better to just
> create a new FeatureCollection with resulting features already
> calculated?
>
And keep it in memory? It may work on a desktop or a batch software,
where you have all the heap for yourself, but in a server where you
have hundreds of concurrent requests it's a recipe for OOM.
In case of chaining where the processes following this one
read the data than once it would be good to have some sort
of intelligent storage that can also save the results on disk during
the first read (or in memory if they are small enough), sort of
that the unix "tee" command does, and then be able to re-read
it from storage.
Here is the ideal solution as I see it:
- processes mark the input they will scan more than once so that
we know streaming is not the idea solution (this applies not only
to process chaining, but also reading from a remote WFS via
WFS cascading for example)
- wrap those data sets with a smart cache that can trade data
between memory and disk so that we can impose both
a global memory limit on the whole server and a per WPS call
limit (somewhat similar to the JAI tile cache)
Mind, most of the above comes naturally for raster data as long
as processes use JAI, since we do tile oriented computation and
we have the JAI tile cache playing buffer to avoid too much
recomputation, but also allowing to control the total amount
of memory used by tiles: what we need is the "equivalent" for
feature collections.
That said, there are processes where keeping the whole result
in memory is unavoidable due to limitations in the libraries used to
perform the computation, see the collect geometries process for example.
But those processes have to be the exception, not the rule, otherwise
a WPS service based on these will be impossible to keep stable and
running in face of the varied load that a internet server ends up seeing
Cheers
Andrea
--
==
Our support, Your Success! Visit http://opensdi.geo-solutions.it for more
information.
==
Ing. Andrea Aime
@geowolf
Technical Lead
GeoSolutions S.A.S.
Via Poggio alle Viti 1187
55054 Massarosa (LU)
Italy
phone: +39 0584 962313
fax: +39 0584 962313
mob: +39 339 8844549
http://www.geo-solutions.it
http://twitter.com/geosolutions_it
-------------------------------------------------------
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
GeoTools-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel