Jessy and I were ranting about adding the ability to the GeoTools
DataStore api of being required for a page of data/features, rather
than the whole dataset.
That would immensely improve (and actually allow) some common
scenarios like presening a result set in tabular form.
On the client side (uDig in this case, but could be JSF, swing,
whatever), it is easy to, for example, implement a lazy List, as long
as the underlying data api allows for pagination.
Doing so in geotools would be actually easy!
No API change would be needed beyond adding two fields to Query:
fromIndex and pageSize. OrderBy is already present in the Filter spec.
here is the conversation, hope to get some comments.
Gabriel
-----------------------
[18:40:00]Gabriel Roldán was thinking what should be needed on the
geotools front in order to make TableView lazily loaded. On the uDig
front it is easy, problem would be with the insufficient geotools
Data api
[18:40:50] … I thought the DataStore api could be augmented in a
similar fashion as the catalog queries are done, aka, you can request
by "pages" of data
[18:40:56] Jesse Eichar I think Jody envisioned using the FeatureList
API
[18:41:01] … you can get all the fids.
[18:41:16] … then using the FeatureList API load up the features by fid
[18:42:27] Gabriel Roldán I mean, on the uDig side you can use
FeatureList for sure. The lazy loading strategy would be transparent
for uDig, but still the datastore api should be more friendly
[18:42:35] … getting all the fids could still be a pain
[18:43:02] Jesse Eichar I see.
[18:43:12] Gabriel Roldán an approach that's proven to work is
getting the feature/element count and then request by pages of a user
defined size
[18:43:42] … (proven to work == I'm doing it on the catalog
implementation)
[18:43:54] Jesse Eichar I can't think of any problems with that right
off.
[18:44:25] … Where would that fit in?
[18:44:34] … WE have a feature collection
[18:44:38] … feature list..
[18:44:54] … would you add a method to feature store? [18:45:06] …
New type of query?
[18:45:10] Gabriel Roldán you should treat FeatureList as a normal
list, use the get(int) or iterator() as normal
[18:45:38] … the featurelist impl uses a paging strategy to retrieve
content on the back
[18:46:19] … provided that you pass it the Query and the list size in
the constructor, for example
[18:46:58] Jesse Eichar Makes sense. Would you add a new method in
FeatureSource? getFeatures(Query, pageSize)
[18:46:57] … ?
[18:47:18] … or make a new type of Query that has that information.
[18:47:29] Gabriel Roldán getFeature(Query, startIndex, pageSize)
[18:48:26] Jesse Eichar I think I prefer that too. But could be hard
to get it to fly because it will eventually require negotiation with
GeoAPI.
[18:49:11] Gabriel Roldán Well, could encapsulate it in Query as
well, after all Query _is_ a parameter object
[18:50:14] … and I guess something like that could certainly be in
future versions of wfs spce at least, since they already recognized
the problem and designed it for catalog 2.0
[18:50:57] Jesse Eichar I didn't know that.
[18:53:52] Gabriel Roldán now, implementing getFeatures(Query, from,
size) or whatever would have its implications. Some RDBMS backed
datastores could manage it easyly I guess
[18:54:32] Jesse Eichar It does. [18:54:34] … WFS for example
[18:55:16] … The problem is that things aren't inherently ordered
[18:55:46] … so index 3 could (at least theoretically) be a different
item between calls.
[18:56:31] … WFS 1.1 I think has some sort-by functionality I think
but 1.0 doesn't
[18:57:49] … For that one it makes sense to get all fids in the
query. Shapefile and other file based ones I the will be ok.
[18:58:13] Gabriel Roldán just 1'
[18:58:39] Jesse Eichar sur
[19:00:34] Gabriel Roldán sorry, had a phone call
[19:00:42] Jesse Eichar np
[19:00:44] Gabriel Roldán you're completelly true
[19:01:10] … so a requirement would be an order being explicitly set
in the Query
[19:01:37] Jesse Eichar For WFS we could obtain all fids and manage
the paging on the client, at least until 1.1. [19:01:39] Gabriel
Roldán what I'm doing in catalog is ordering by ID if the Query has
no orderBy
[19:02:05] Jesse Eichar we can order by fid if not specified.
[19:02:14] … seems reasonable.
[19:02:19] Gabriel Roldán problem with fids is that getting two
million fids could still be quite killer
[19:03:07] Jesse Eichar I know it. I'm open to suggestions...
[19:03:09] Gabriel Roldán what raises me another concern I was
thinking on
[19:03:41] … I know we've defined feature ids to be String as to be
friendly with the WFS spec
[19:03:58] … still it makes no much sense on the pure java side of
things
[19:04:10] … I would like to see FID as an interface
[19:04:33] … so implementors could optimize as needed, instead of
creating millions of strings by prepending the feature type name, etc
[19:05:07] … but that's another concern, I tend to ramp :P
[19:05:16] Jesse Eichar :D You best jump on the FM discussion with
that. I don't think that's going to happen too soon.
[19:05:34] Gabriel Roldán yeah, I guess so
[19:06:11] Jesse Eichar But back to the point. I'm completely in
agreement with you on the Paging requirement.
[19:06:30] … I'd be happy to do some of the implementations for you.
[19:06:49] Gabriel Roldán cool, that kind of stratagies works just
great for presenting huge amounts of tabular data in other domains
[19:06:56] … so it should work for us too
[19:07:19] Jesse Eichar I think it has to be done. Its impossible to
deal with this amount of data otherwise.
[19:07:47] Gabriel Roldán I like the idea that the FeatureSource
interfaces doesn't needs to be touched
[19:07:53] … just Query
[19:08:19] Jesse Eichar It'll be much quicker and easier to get it
integrated with geotools that way. Pretty clean too. [19:08:19]
Gabriel Roldán we already have order by in Filter, so just from and
page size are needed
[19:09:12] … note that FeatureCollection.size() still should return
the whole query size (aka, hits), and not the page size
[19:09:31] Jesse Eichar Yes.
[19:09:47] Gabriel Roldán in that case I'm wondering what's the
easiest way of knowing when you're done fetching content
[19:09:51] Jesse Eichar and get() can get any feature, not just those
in the current page (for FeatureList)
[19:09:57] Gabriel Roldán other by requiring client code to use a
counter
[19:10:02] … sure
[19:10:32] … I did that for presenting catalog results using Java
Server Faces and works great
[19:10:57] … a custom list impl that queries the required page of
data if the index isn't on the current page
[19:11:38] Jesse Eichar Do you cache the fetched features so you
don't have to get them more than once? or get a fresh copy each time?
[19:12:36] Gabriel Roldán in the catalog case I fetch the whole page
onto memory. In our case I guess we could be even smarter and
maintain the streamed nature of stuff even on the pages
[19:13:12] … not sure if I'm explaining me well enough
[19:13:25] Jesse Eichar You're doing fine
[19:16:02] Gabriel Roldán cool, do you mind if I post this to the list?
[19:16:15] Jesse Eichar No at all
[19:16:26] Gabriel Roldán better said, do you think it is something
that worths being posted?
[19:16:37] Jesse Eichar haha
[19:16:44] … Yeah I think it should be.
[19:16:56] … People will have comments for sure.
[19:17:34] Gabriel Roldán nice, forgot to jump on the geoserver irc
meeting, uhg
[19:17:49] Jesse Eichar shoot me too