Thanks Shengnan,

It is possible that our GEOS implementation (being a fairly naive port from Java) is still a cause of the gross inefficiencies (in particular the many small malloc/free cycles).

I think our priority order should be first removing the bottlenecks in our coding style (fix our memory management inefficiencies) and then look at using some of the CPU/GPU features available in new cores to eke out some more performance going forward.

Even when we do that, the bulk of the processing time will still sit on the GIS side, but at least it will be a smaller bulk :)

Paul

Cong, Shengnan wrote:
Hi, Paul,

I have done some experiments with PostGIS using synthetic data from
Andrew Rogers. The data sets (100k~10M) were raw data roughly restricted
to the continental US, and the test queries were subsets restricted by a
given bounding box, using a number of different bounding box sizes. The
GIST geospatial index was used.

I used V-tune (Intel performance tool) to get profiling information and
studied the source codes a bit.
Here are some observations:

1. The query processing is computation bound. There is little cache
misses observed. The L2 cache miss ratio is below 1%. And there is no
observation of bus saturation.
2. The breakdown of computation time are mainly: -- 47.5% in GEOS lib (spatial operations)
-- 34.1% in system calls (around 60% in malloc/free)
-- 5.4% in PgSQL server -- 4.3% in PostGIS lib
3. The time spent in GEOS lib is not focused on some specific function,
The time was evenly distributed among various tiny functions.

4. The parallelization problem probably is more DBMS related than GIS
related, since it may involve PgSQL internals more than PostGIS
internals.

It shows that the GEOS lib and the memory management are the performance
bottleneck of query processing, instead of PgSQL or I/O.

The GEOS lib contains tiny functions and dynamic linked. Probably if the
lib is inlined, the performance could be improved. And also some spatial
operations may be covered by Intel IPP (Integrated Performance
Primitives), which may help to achieve better performance of the GEOS
lib. In regard to memory management, using self-managed memory may
reduce the overhead of malloc and free calls.

Thanks.

Shengnan

_______________________________________________
geos-devel mailing list
geos-devel@geos.refractions.net
http://geos.refractions.net/mailman/listinfo/geos-devel

Reply via email to