Hello Travis
Thanks for your feedbacks!
Le 26/02/14 13:43, Travis L Pinney a écrit :
Could you go into more detail on how the filter
works? Would the filter be able to use indexes opportunistically if
they existed?
Yes, the plan is to allow to use index when it exists. In addition to
Shapefile, there is also PostGIS indexes to leverage.
Leveraging index is possible on the assumption that the DataStore
implementations know about the index details. For example a
ShapefileStore would know that the index is provided in a ".shx" file,
while a PostgisStore would know how to write SQL statements that use the
indexes. Since the getFeatures(Query) method would be defined in
FeatureStore, the store implementation would hopefully have enough
information for leveraging the indexes.
For the API, I would propose to leverage the new JDK8 Stream API. For
the trunk which still on JDK6, we could not use the JDK8 classes but we
could provide some custom methods as close to JDK8 as we can.
The Stream interface [1] has a "filter" function, which could be used
like below (using lambda expressions):
Stream<Feature> features = datastore.getFeatures(null).stream();
features = features.filter(f -> f.boundedBy().intersect(mySearchArea));
The above would work out-of-the-box using the default Stream
implementation provided by JDK8. However a ShapefileStore and
PostgisStore would provide their own Stream implementation and override
the filter method in order to leverage indexing. Since the stream is
created by Collection.stream() (a new method in JDK8) and the Collection
is itself created by the FeatureStore, the store has indirect control
over the Stream implementation, and consequently over the 'filter'
method implementation.
The advantage of using the JDK8 Stream API is that it is designed for
paralellization. We are entering in a new world here...
In the case of a shapefile, it will have have ".shx" file which would
allow you to jump to the first record in a slice. For example, if you
wanted to read only the records 10000 through 10010 out of the
shapefile, you could read the ".shx" file to find the byte offset of
record 10000. Only the minimum amount of data is read from the file
when done in this manner.
Some file formats will not have record offsets like a shapefile. For
those formats it would be possible to start at the beginning of the
file and skip over records until it reaches the start of the slice.
Right. The first case would be a data store providing its own
implementation of Stream, while the second case would be a data store
relying on the default Stream implementation provided by JDK8.
There is missing functionality on sis-shapefile for some of the Shape
types. Should I work on those features in the Shapefile branch?
You could also commit on trunk, JDK6 or JDK7 branch, and we could close
the Shapefile branch, as you wish. If working on the JDK7 branch is easy
for you, it may be easier for synchronizing our work. But we would just
need to make sure that we do not edit the shapefile class in same time...
Martin
[1] http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html