Martin, I take the liberty to CC the list, as it is an interesting issue and it's better to go public if we want to make progress on that.
In a few words, this is about how we could speed up GetFeature() on the layers returned when issuing a SQL request on OGR drivers that rely on databases : PG driver, MySQL driver, SQLite driver (... ?). For people wanting to take part, you might be interested in the below email exchange first. After examining Martin's patches and reading its explanations, I cann see 2 possibilities if we want to implement such optimization : 1) we change the GetNextFeature() of the ResultSetLayer of the PG, MySQL and SQLite drivers so that they always return a "FID" which is an integer : 0, 1, 2, 3 instead of the current behaviour where sometimes it is the FID from the table when the result is just a sub-set of the table, or this ordinal suite in other cases. I don't think that we have a strict need to match the FID of a SQL result set with one of the database table since it is not possible in all cases (see the example of the PG driver which sometimes return a "true" FID, sometimes just an ordinal). Always returning 0, 1, 2, etc. as an FID on such layers might be acceptable I think. This way the current GetFeature() would match both its advertized semantics (GetFeature(nFID)->GetFID() == nFID) and the use you want to make from it. Perhaps we should have a capability (OLCFIDIsOrdinal) on that layer that means that feature ids are guaranteed to be just ordinals, so that the user can be confident in the fact that GetFeature(nFID) where 0 <= nFID < nFeatureCount will always succeed. (Of course, the layer should also advertizes the OLCRandomRead so that the user knows that GetFeature() is specialized and fast) Pro : no need of new API, existing apps can benefit from enhanced performance for free Cons : change in behaviour for existing applications that would rely on the fact that the feature id of a sql layer matches - sometimes - the feature id of the associated table layer. AFAIK, this assumption is not made in the doc of the OGR API. 2) add a new method in OGRLayer, something like GetFeatureByIndex(nIndex) ? that would always return the nIndex th element (starting with 0) returned by iterating GetNextFeature() nIndex+1 times. A default implementation of that would be trivial in OGRLayer, and it could be specialized in the interested drivers. Pro : doesn't break anything existing Cons : * new API * might be confusing as close to GetFeature() * only useful in very few cases (drivers which don't have a concept of FID just returns ordinals) 3) do nothing... The user application can iterate on the result set and stores in its own array a pointer to the features returned by GetNextFeature() Pro : no change in GDAL Cons : * no change in GDAL ;-) * if the result set is significant, might require a lot of memory and time to build this array So, personnaly, I think solution 1) might be the best solution. What do you think ? Best regards, Even > Even, > > Ok, after reviewing the code what I did is essentially overload the virtual > function GetFeature(long nFeatureId) in the resultset layer to provide > myself fast random access functionality in a resultset returned by > ExecuteSQL() for a resultset row "index" versus an FID. GetFeature() on the > table class uses an FID which is not the same as the row index for a > resultset. I really should have named it GetFeature(long index) instead of > GetFeature(long nFeatureId). > > Now I understand better what I did. If you call ExecuteSQL() with a spatial > of attribute filter I believe FID's come back like the following: > > 3 > 20 > 25 > 26 > 100 > 48 > ... > > And I needed an ordinal offset regardless of resultset filter like > > 0 > 1 > 2 > 3 > ... > > The reason was because I had other data structures in my application (like > arrays and an in-memory rtree structure) that only worked with ordinals. > Thus, in order to link the multiple structures without sacrificing > performance I needed to work with array offsets versus FID's. > > Moral of the story, GDAL may be able to use FID's in an efficient manor when > being used alone, but when included in higher level applications that use > additional data structures the FID concept is limiting when using other > linked data structures that are limited to ordinal array offsets. > > Martin > > > > -----Original Message----- > From: Even Rouault [mailto:[email protected]] > Sent: Tuesday, August 25, 2009 4:22 PM > To: [email protected] > Subject: About your proposed optimizations in OGR db drivers > > Martin, > > don't ask me how, but I've just happened to read this page : > http://postgis.refractions.net/pipermail/postgis-users/2008-April/019246.htm > l > > Could you explain a bit more the background of those optimizations that at > first > sight looks interesting ? I see that the 3 ones are for the layer returned > by > ExecuteSQL on PG, MySQL and sqlite. > > 0) You mentionned that you have proposed them before, but I couldn't find > where. > (I haven't dig very deep in trac or the archives to be honest ;-)) > > 1) I'm wondering why you choose implement random reading by GetFeature() on > SQL > results, and not use directly the one that exists for the layers that map to > the > DB tables (AFAIK, GetFeature() is properly implemented in OGRPGTableLayer, > OGRSQLiteTableLayer and OGRMySQLTableLayer) ? > > 2) I think your optimizations would need some rework to be always correct : > > a) For SQLite, it would only work if the user has created the layer with > a > "SELECT * FROM a_table WHERE ....". If the user only request a few columns, > the > poFeature->SetField() would probably try to feed fields that don't exist in > the > layer. If the SQL result is a result of something more complex like a join, > there is really no table to query a FID in... So the analysis of the initial > request would require to be restricted to a few selected cases. But then I > go > back to 1) > > b) For Postgres, your optimization violates the contract of GetFeature() > which > says "If this function returns a non-NULL feature, it is guaranteed that its > feature id (OGR_F_GetFID()) will be the same as nFID.". But "FETCH ABSOLUTE > nFeatureId" will return the nFeatureId th feature of the result of the > request. > I've observed that when you do a "SELECT * FROM a_table WHERE....", > GetNextFeature() manages to preserve the FID from the table, so the > optimization > wouldn't clearly respect the contract (it would return the nFeatureId th > element, instead of the feature whose FID = nFeatureId). For a request like > "SELECT a_column FROM a_table WHERE...", GetNextFeature() returns the > features > with fids of 0,1,2,..., so the optimization would almost work, except that > it > would have a off-by-one error > > c) For MySQL, I haven't yet analyzed due to lack of testing env. > > Best regards, > > Even > > _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
