> > Date: Mon, 19 Jul 2010 16:34:40 +0200 > From: Martin Dobias <[email protected]> > Subject: Re: [gdal-dev] Optimizing access to shapefiles > To: Frank Warmerdam <[email protected]> > Cc: [email protected] > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=ISO-8859-1 > > Hi Frank > > On Mon, Jul 19, 2010 at 3:46 PM, Frank Warmerdam <[email protected]> > wrote: > >> 1. allow users of OGR library set which fields they really need. Most > >> of time is wasted by fetching all the attributes, but typically none > >> or just one attribute is necessary when rendering. For that, I've > >> added the following call: > >> OGRLayer::SetDesiredFields(int numFields, int* fields); > >> The user passes an array of ints, each item tells whether the field > >> should be fetched (1) or not (0). The numFields tells the size of the > >> array. If numFields < 0 then the layer will return all fields (default > >> behavior). The driver implementation then just before fetching a field > >> checks whether to fetch the field or not. This optimization could be > >> easily used in any driver, I've implemented it only for shapefiles. > >> The speedup will vary depending on the size of the attribute table and > >> number of desired fields. On my test shapefile containing 16 fields, > >> the data has been fetched up to 3x faster when no fields were set as > >> desired. >
Would it make sense instead of implementing a SetDesiredFields(..) to implement a SetSubFields(string fieldnames) where the function takes a comma delimited list of subfields and then those are parsed by the shapefile driver to find out which field values to fetch? That way, for other drivers that have a SQL based underlying datastore, the way they would implement that fetching behavior would be by putting that content between the SELECT and the FROM portion. > > > > Martin, > > > > Would GetFeature() still return a feature with a full vector of > > fields, but those not desired just being left in the null state? > > Yes, that's what the patch does - it only omits fetching the value of > some fields. > Of course if this is a requirement (need to have the full vector of fields) then there would need to be some extra work done (with the approach I describe above) to satisfy it. > > If so, I think such an approach would be reasonable. However, it will > > require an RFC process to update the core OGR API. Are you willing > > to prepare such an RFC? > > Will do. > > > >> 2. reuse allocated memory. When a new shape is going to be read within > >> shapelib, new OGRShape object and its coordinate arrays are allocated. > >> By reusing one such temporary OGRShape object within a layer together > >> with the coordinate arrays (only allowing them to grow - to > >> accommodate larger shapes), I have obtained further speedup of about > >> 30%. > > > > As GetFeature() returns a feature instance that becomes owned by the > > caller I do not see how this could be made to function without a > > fundamental change in the OGR API. Perhaps you can explain? > > One note to avoid confusion: the suggestion I've made above relates > only to shapefile driver in OGR and doesn't impose any changes to the > API. The suggested patch reuses OGRShape instances which are passed > between OGR shapefile driver and shapelib. These OGRShape instances > never get to the user, so it's just a matter of internal working of > the shapefile driver. Please take a look at the patch if still > unclear. > IMHO having a way to avoid fetching data would benefit all drivers. > > Below I explain the further idea which I haven't implemented yet, > which should save allocations/deallocations of OGRFeature instances > and which could boost the speed of retrieval of data from any OGR > driver: > > GetFeature() returns a new instance and DestroyFeature() deletes that > instance. My idea is that DestroyFeature() call would save the > instance in a pool (list) of "returned" feature instances. These > returned features could be reused by the GetFeature() - it will take > one from the list instead of creating a new instance. I think this > doesn't make any influence on the public OGR API, because the > semantics will be the same. Only the OGR internals will be modified so > that it will not destroy OGRFeature instance immediately, because it > will assume that more GetFeature() calls will be issued. > > If the pool would be specific for each OGRLayer, many > allocations/deallocations of OGRFeature and OGRField instances could > be saved, because the features contain the same fields, they would > only have to be cleaned (but the array would stay as-is). A layer has > usually the same type of geometry for all features, so even geometries > could be kept and only the size of the coordinate array would be > altered between the calls. > This is effectively what happens in ArcObjects cursors (recycling vs non-recycling behavior). All drawing in ArcMap (except when in EditSessions) use recycling cursors mixed with a subfields clause since it makes drawing *much* faster. My two cents, - Ragi
_______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
