+1 on the changes. Could you go into more detail on how the filter works? Would the filter be able to use indexes opportunistically if they existed?
It would be useful to filter by slicing or pagination. In the case of a shapefile, it will have have ".shx" file which would allow you to jump to the first record in a slice. For example, if you wanted to read only the records 10000 through 10010 out of the shapefile, you could read the ".shx" file to find the byte offset of record 10000. Only the minimum amount of data is read from the file when done in this manner. Some file formats will not have record offsets like a shapefile. For those formats it would be possible to start at the beginning of the file and skip over records until it reaches the start of the slice. There is missing functionality on sis-shapefile for some of the Shape types. Should I work on those features in the Shapefile branch? Thanks, Travis On Tue, Feb 25, 2014 at 2:01 PM, Martin Desruisseaux <[email protected]> wrote: > Hello all > > I would like to propose the following modifications to the Shapefile reader: > > * Rename as ShapefileStore for consistency with NetcdfStore. > * Declare ShapefileStore as a subclass of DataStore: > o Implement the getMetadata() method using the information > currently stored in fields like version, xmin, xmax, etc. > * Turn public fields into private fields instead. > * Add a getEnvelope(Query) method which returns the (xmin, ymin, etc.) > values. > o The Query argument would be an empty class for now, but still > defined as a placeholder for future developments. > * Add a getFeatures(Query) method which returns a FeatureCollection > (extends Collection<Feature>). > > > To explain more about the last point: the intend is to be able to read large > set of Features without loading all of them in memory. So instead than > storing all Features in a HashMap, we would allow implementations to return > a collection backed by an iterator instantiating the Features on the fly. > Such Iterator is the same idea than java.sql.ResultSet. > > Using a method instead than direct to a public field has two purposes: > > * Allows to specify a filter or other query aspects. > * Allows to returns an Autocloseable collection for implementations > that perform their I/O operations on the fly. > > > So instead of iterating on features like below: > > for (Feature f : shapefile.FeatureMap.values()) { > // ... do some stuff ... > } > > > We would do: > > try (FeatureCollection features = shapefile.getFeatures(myQuery)) { > for (Feature f : features) { > // ... do some stuff ... > } > } > > > What do peoples think? > > Martin >
