Re: About ShapeFile reader

Martin Desruisseaux Wed, 26 Feb 2014 06:00:40 -0800

Hello Travis

Thanks for your feedbacks!


Le 26/02/14 13:43, Travis L Pinney a écrit :

Could you go into more detail on how the filter
works? Would the filter be able to use indexes opportunistically if
they existed?

Yes, the plan is to allow to use index when it exists. In addition toShapefile, there is also PostGIS indexes to leverage.

Leveraging index is possible on the assumption that the DataStoreimplementations know about the index details. For example aShapefileStore would know that the index is provided in a ".shx" file,while a PostgisStore would know how to write SQL statements that use theindexes. Since the getFeatures(Query) method would be defined inFeatureStore, the store implementation would hopefully have enoughinformation for leveraging the indexes.

For the API, I would propose to leverage the new JDK8 Stream API. Forthe trunk which still on JDK6, we could not use the JDK8 classes but wecould provide some custom methods as close to JDK8 as we can.

The Stream interface [1] has a "filter" function, which could be usedlike below (using lambda expressions):


Stream<Feature> features = datastore.getFeatures(null).stream();
features = features.filter(f -> f.boundedBy().intersect(mySearchArea));

The above would work out-of-the-box using the default Streamimplementation provided by JDK8. However a ShapefileStore andPostgisStore would provide their own Stream implementation and overridethe filter method in order to leverage indexing. Since the stream iscreated by Collection.stream() (a new method in JDK8) and the Collectionis itself created by the FeatureStore, the store has indirect controlover the Stream implementation, and consequently over the 'filter'method implementation.

The advantage of using the JDK8 Stream API is that it is designed forparalellization. We are entering in a new world here...

In the case of a shapefile, it will have have ".shx" file which would
allow you to jump to the first record in a slice. For example, if you
wanted to read only the records 10000 through 10010 out of the
shapefile, you could read the ".shx" file to find the byte offset of
record 10000. Only the minimum amount of data is read from the file
when done in this manner.

Some file formats will not have record offsets like a shapefile. For
those formats it would be possible to start at the beginning of the
file and skip over records until it reaches the start of the slice.

Right. The first case would be a data store providing its ownimplementation of Stream, while the second case would be a data storerelying on the default Stream implementation provided by JDK8.

There is missing functionality on sis-shapefile for some of the Shape
types. Should I work on those features in the Shapefile branch?

You could also commit on trunk, JDK6 or JDK7 branch, and we could closethe Shapefile branch, as you wish. If working on the JDK7 branch is easyfor you, it may be easier for synchronizing our work. But we would justneed to make sure that we do not edit the shapefile class in same time...


    Martin


[1] http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html

Re: About ShapeFile reader

Reply via email to