Re: [h2] Re: H2 seems to create a huge number of orphan lobs since 1.4.183.

bocher Mon, 04 May 2015 12:49:53 -0700

Hi Vitali,

Some comments.


 H2GIS is an improvement and refactoring of the H2spatial extension used 
and developed since 2005 at CNRS. The first release was presented during 
the GvSIG days in 2006 
(https://halshs.archives-ouvertes.fr/halshs-01145771). In fact, the H2 
spatial extension was developed to support hydrological spatial analysis 
methods during my thesis (2002-2005). The first architecture of H2 spatial 
was very very chaotic :-(. In July 2006, we discover the Chris Holmes (Open 
plan projet) approach's, available from Derby and HSQL (
http://old.geoserver.org/SpatialDBBox.html). The second architecture 
2006-2010 follows this approach. A custom blog data type was used to store 
geometry in H2 database.

In 2011, we decided to contact Thomas to talk with him about spatial index 
in H2. Thomas was (as usual) very receptive to our needs and he has added a 
Rtree index storage in H2. Since 2011, we collaborate with the H2 community 
about the geometry type and a new extension called H2GIS is born. 

H2GIS is used in my team to process huge data and create advanced spatial 
analysis or simulation like noisemap (http://noisemap.orbisgis.org/). For 
example, we are able to process billion of noise sources located on road 
network. So yes H2 database is very robust and efficient.


 You said « Alltogether very likely I will do refactoring of geodb, hatbox 
and GeoTools to work with GEOMETRY type... »

I'm working on a extension to connect H2GIS with Geoserver using the 
Geotools datastore model (https://github.com/ebocher/geoserver-h2gis, 
thanks to geotools community). May be a good option for you... So you can 
take profit of all H2GIS functionalities (ST_ConstainedDelunay, Network 
Analysis, SPHAPEFILE TABLE access...).


 My colleague Nicolas Fortin will definitely answer to the technical points 
(memory usage, data type…).


 Best regards


 Erwan


Le lundi 4 mai 2015 16:07:52 UTC+2, Vitali a écrit :
>
>
>
> On Monday, May 4, 2015 at 3:32:39 PM UTC+3, Thomas Mueller wrote:
>>
>> Hi,
>>
>> > But already for many years  the spatial support was provided by a 
>> combination of geodb + hatbox libraries and integration in GeoTools  world
>>
>> Yes. However, those don't use the built-in R tree. Do they use an 
>> external R tree?
>>
>
> Hatbox provides R-Tree.  It is based on H2 infrastructure (some auxiliary 
> table is created where nodes are stored).  What is "built-in" R-Tree would 
> mean?
>
>>
>> > All these was done on BLOB type where a geometry WKB is stored. 
>>
>> A small BLOB is stored inline, so it might not be that bad.
>>
>> >  any access of BLOB value makes a copy of it
>>
> Yes, that is what I meant.  For certain scenarios it has significant 
> performance issues anyway as I experienced. Millions of temporary LOB 
> entries when you have just dozens of thousands of spatial records and some 
> not very optimized spatial query.
>
>>
>> Access is making a copy of the reference of a large BLOB.
>>
>> > Isn't it  2Gb is a limit for binary types?
>>
>> I reality the problem is the memory usage (heap memory).
>>
>
> That  should not be a problem.  Typically in GIS application the biggest 
> result  sets extracted from the database are not hold or cached long time 
> but rather used to render spatial features and immediately any references 
> released in JVM. Whether it's BLOB or BINARY anyway it's loaded to memory 
> to parse Geometry from WKB.  May be with VARBINARY a  bit more data is kept 
> during short period  of time in memory than would be with BLOBs.  May be I 
> would consider an approach in ValueGeometry  that bytes are kept just until 
> geometry is requested, then lazily Geometry is parsed and bytes are 
> released. So that at any point of time whether bytes are hold or Geometry 
> as an object. From bytes to Geometry, from Geometry to bytes when necessary.
>
> In SELECT scenarious bytes are needed until Geometry object is created and 
> then it is used outside of result set or locally during command execution. 
> I am not sure how relevant in scope of the whole database infrastructure 
> does this sound. 
> Am I right that until local result set data structure is fully composed it 
> is not returned to caller?  Then if result set is huge then all bytes are 
> anyway kept in memory until result set is delivered and the client starts 
> to request Geometry objects when bytes would be cleaned...
>
>
> Vitali.
>
>
>> Regards,
>> Thomas
>>
>>
>>
>> On Sun, May 3, 2015 at 9:51 PM, Vitali <[email protected]> wrote:
>>
>>> Hello.
>>>
>>> I would like to share some   observations.  Recently H2 got a Geometry 
>>> type, logic around it seems is growing, also some extra tiers like H2GIS 
>>> are under development.  All together this seems as a future of spatial 
>>> support in H2.  But already for many years  the spatial support was 
>>> provided by a combination of geodb + hatbox libraries and integration in 
>>> GeoTools  world (as  H2 data store  interface for storing/managing spatial 
>>> features with geometries).
>>> All these was done on BLOB type where a geometry WKB is stored. 
>>>
>>> BLOB became completely useless as a type for handling WKB of geometries. 
>>> Because of  this change that any access of BLOB value makes a copy of it. 
>>> HATBOX and GEODB libs based  on JTS library  provide functions to work with 
>>> WKB. But any call of these functions makes a read of BLOB value which makes 
>>> a copy in memory.  Some spatial conflation operations being not-optimized 
>>> (having polynomial complexity with applying spatial predicates between any 
>>> combination of input geometries from 2 tables e.g.)  now have a 
>>> catastrophic performance and memory consumption.  Cases where  old H2 just 
>>> worked 10 secods performing some kind of spatial operation between 2 layers 
>>> (tables) now runs 2 hours , 3Gb of database file (instead of 400Mb 
>>> normally) and outofmemory error finally. And long cleanings of temporary 
>>> LOB storage on app start, app close, transaction commit after such 
>>> operations.
>>>
>>> I understand real reasons of this BLOB  copying approach.  But the 
>>> conclusion is that BLOB is not a right type for geometries. In typical GIS 
>>> (like UDIG) thousands of records are extracted every second for multiple 
>>> layers during rendering and other types of requests need geometries. Now 
>>> BLOB became inefficient.
>>>
>>> Alltogether very likely I will do refactoring of geodb, hatbox and 
>>> GeoTools to work with GEOMETRY type which is basically VARBINARY kind of 
>>> which means WKB is just read to memory. But it is what usually is needed to 
>>> GIS app - to get a geometry almost every time when data is read. Also 
>>> because  JTS geometry is lazily cached in ValueGeometry various logic in H2 
>>> (like custom spatial functions call multiple times) gets benefits.  I think 
>>> H2GIS toolkit more or less uses this approach already.
>>>
>>> The only concern is that are there any limitations for cases like "lake 
>>> boundary" that consists from hundreds of  thousands of vertices.. Isn't it  
>>> 2Gb is a limit for binary types? Then it's fine..  But how do older 
>>> PageStore and modern MVStore handle this type? Any performance issues?
>>>
>>> Vitali.
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "H2 Database" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> To post to this group, send email to [email protected].
>>> Visit this group at http://groups.google.com/group/h2-database.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/h2-database.
For more options, visit https://groups.google.com/d/optout.

Re: [h2] Re: H2 seems to create a huge number of orphan lobs since 1.4.183.

Reply via email to