Hi Jim,

unfortunately I have zero experience with distributed databases so the 
fiddling with temporal sharding is probably what I'll end up doing.

However, GeoMesa looks interesting, so I'll probably do my homework and 
learn something about it.


Thank you


On 21. 6. 2016 17:17, Jim Hughes wrote:
> Hi Peter,
>
> Are you always querying for a short time period?  If so, you might get
> the most mileage out of a SQL database (Oracle/Postgres) by creating an
> index on time and providing any vendor specific query hints to leverage
> that index.  From a GeoTools/GeoServer perspective, WMS/WFS queries are
> turned into SQL queries; how the database handles those is a database
> admin challenge.  With a time-based index and restricted queries, there
> might be a window where performance can be reasonable.
>
> It may be worth considering a distributed database.  If your
> architecture prevents that, then the options are tuning Oracle or
> PostGIS to the max and/or implementing application-based temporal
> sharding.  The former can require expensive licenses (as you mentioned)
> or some fiddling.  An example of temporal-sharding would be having a
> different table for each month or week.  As you query across time, your
> application would know that it has to ask for different layers in
> GeoServer or tables in the database.
>
> If you can use a distributed database, I'd note that there are a number
> of projects to provide geo extensions to popular options such as
> Accumulo, Cassandra, HBase, ElasticSearch, etc.  A number of those
> projects include a GeoTools datastore implementation or a GeoServer
> plugin which makes them compatible with GeoServer.  I am a GeoMesa
> committer and we've had great success using GeoServer to serve up
> feature data and aggregations like heatmaps over datasets scaling to
> billions of records.  Admittedly, setting up a distributed database is
> non-trivial, but it may allow for more options when working with large
> datasets.
>
> Cheers,
>
> Jim
>
> On 06/21/2016 10:22 AM, Peter Kovac wrote:
>> Dear GeoServer users,
>>
>> I have trouble figuring out fast and scalable way to serve lightning
>> data via GeoServer.
>>
>> My dataset consists of several million points spread over central Europe
>> spanning several years. I have approximately 3M lightning strikes right
>> now and it's just a fraction of what I'll have to handle ultimately. I'm
>> using Oracle Locator database with both spatial and normal indexes and
>> while it has a few quirks it works reasonably well when the amount of
>> lightnings is small (i.e. in thousands or tens of thousands).
>>
>> While my WMS client will never allow to show more than 2 hours worth of
>> data (that's small amount of lightning strikes) there are particular WMS
>> requests which take a very long time: when I want to see a "bigger
>> picture" of all lightning strikes in central Europe during a specified
>> short period of time.
>>
>> The core of the problem is my data have both spatial and temporal
>> dimension and there is no spatio-temporal index in Oracle Locator. So
>> even if regular index on the time dimension can limit the number of
>> features to a few thousands in the blink of an eye, the spatial index
>> over the point geometry column won't help much (since the BBOX in the
>> request covers the whole area anyway) and is in fact doing harm. The
>> query found in GeoServer logs runs really fast If I omit the spatial
>> index clause in such case (just a few hundred ms compared to 6-7 seconds
>> for the full query with SDO_FILTER function call). Another bad thing is
>> my colleagues predict that the performance will get worse with more data
>> in the table once it won't fit into RAM and the database engine will
>> have to use hard drives for processing.
>>
>> The performance improves rapidly as I zoom to larger scales (smaller
>> areas), where the spatial index selects just a small subset of data.
>> However, I'd like to be able to serve the whole central Europe quickly, too.
>>
>> One possible solution is to add the time dimension to my spatial index
>> (so it's 3D instead of 2D), but I'm afraid GeoServer won't be able to
>> retrieve data from such index (it won't be EPSG:3857 geometry anymore).
>>
>> Another solution from an Oracle forum suggests using partitioning over
>> time and have separate spatial index for each partition, but that
>> requires expensive Oracle Enterprise license (which was not budgeted in
>> the project of course) and it's just dividing the problem by a constant
>> factor anyway.
>>
>> So, since I'm out of my own ideas, how would you handle this situation?
>> What other tools or formats are useful? Is the Postgres/PostGIS combo
>> better at serving large-scale spatio-temporal datasets (with regard to
>> GeoServer)?
>>
>>
>> Many thanks for any help!
>>
>
> ------------------------------------------------------------------------------
> Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
> Francisco, CA to explore cutting-edge tech and listen to tech luminaries
> present their vision of the future. This family event has something for
> everyone, including kids. Get more information and register today.
> http://sdm.link/attshape
> _______________________________________________
> Geoserver-users mailing list
> Geoserver-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/geoserver-users

-- 
Peter Kovac
IMS Programmer
MicroStep-MIS
peter.ko...@microstep-mis.com


------------------------------------------------------------------------------
Attend Shape: An AT&T Tech Expo July 15-16. Meet us at AT&T Park in San
Francisco, CA to explore cutting-edge tech and listen to tech luminaries
present their vision of the future. This family event has something for
everyone, including kids. Get more information and register today.
http://sdm.link/attshape
_______________________________________________
Geoserver-users mailing list
Geoserver-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geoserver-users

Reply via email to