Hello Travis

Thanks for your feedbacks!

Le 26/02/14 13:43, Travis L Pinney a écrit :
Could you go into more detail on how the filter
works? Would the filter be able to use indexes opportunistically if
they existed?
Yes, the plan is to allow to use index when it exists. In addition to Shapefile, there is also PostGIS indexes to leverage.

Leveraging index is possible on the assumption that the DataStore implementations know about the index details. For example a ShapefileStore would know that the index is provided in a ".shx" file, while a PostgisStore would know how to write SQL statements that use the indexes. Since the getFeatures(Query) method would be defined in FeatureStore, the store implementation would hopefully have enough information for leveraging the indexes.

For the API, I would propose to leverage the new JDK8 Stream API. For the trunk which still on JDK6, we could not use the JDK8 classes but we could provide some custom methods as close to JDK8 as we can.

The Stream interface [1] has a "filter" function, which could be used like below (using lambda expressions):

Stream<Feature> features = datastore.getFeatures(null).stream();
features = features.filter(f -> f.boundedBy().intersect(mySearchArea));

The above would work out-of-the-box using the default Stream implementation provided by JDK8. However a ShapefileStore and PostgisStore would provide their own Stream implementation and override the filter method in order to leverage indexing. Since the stream is created by Collection.stream() (a new method in JDK8) and the Collection is itself created by the FeatureStore, the store has indirect control over the Stream implementation, and consequently over the 'filter' method implementation.

The advantage of using the JDK8 Stream API is that it is designed for paralellization. We are entering in a new world here...

In the case of a shapefile, it will have have ".shx" file which would
allow you to jump to the first record in a slice. For example, if you
wanted to read only the records 10000 through 10010 out of the
shapefile, you could read the ".shx" file to find the byte offset of
record 10000. Only the minimum amount of data is read from the file
when done in this manner.

Some file formats will not have record offsets like a shapefile. For
those formats it would be possible to start at the beginning of the
file and skip over records until it reaches the start of the slice.
Right. The first case would be a data store providing its own implementation of Stream, while the second case would be a data store relying on the default Stream implementation provided by JDK8.


There is missing functionality on sis-shapefile for some of the Shape
types. Should I work on those features in the Shapefile branch?
You could also commit on trunk, JDK6 or JDK7 branch, and we could close the Shapefile branch, as you wish. If working on the JDK7 branch is easy for you, it may be easier for synchronizing our work. But we would just need to make sure that we do not edit the shapefile class in same time...

    Martin


[1] http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html

Reply via email to