Hi,
I've been looking at the directory data store again and
the more I delve into it, the more I believe a rewrite
is the only way to go.

The current one tries to delegate to secondary data stores,
relies on those datastore factories to implement
a specific interface, does not support namespaces (crucial
to have GeoServer use the datastore), assumes one feature type
per child datastore, has serious caching issues, does not
deal properly with datastore disposal and... well... shall
I go on?? ;-)

So I've been thinking about DirectoryDataStore v2.

What it should do:
- given a directory, find all the feature types stored inside it, in
   whatever format
- the directory might contain files that do store more
   than one feature type and datastores that might catch more than one
   file (shapefile, property ds), that should be handled gracefully
- eventually handle recursive scan (but only as an option, it might be
   expensive)
- support proper namespace setting
- the user should not be concerned with the native datastore serving a
   certain file, yet it could be useful to have access to it on occasion

How:
- take a directory, a namespace (eventually a recursion flag)
- scan all the files in the specified directory
- get rid of all the file data store assumptions, just look for
   datastores that can open a certain URL (the current file) with a
   certain namespace and load all the feature types that are inside of
   them

Issues:
- a certain datastore can open multiple files (shapefile, property ds),
   we want to avoid keeping duplicate datastores around
- a directory (or worse, a tree) can hold a massive amount of feature
   types, there are legitimate scalability/memory consumption concerns.

Using a lightweight (soft reference based) cache has issues with
datastore disposal, as the datastore we're trying to dispose might be 
the holder or a resource that  might be in use by a reader or a feature
source, closing it might kill the current user...
This one is hard actually, the api does not give us any clue on
whether a datastore generated object is still being used or not...
To avoid it we'd have to keep strong references to all datastores
that have returned a feature source, a reader or a writer at
least once. Maybe we can add a custom api to this datastore to
force some resource release (stop gap measure for the lack of a better
way).

Suggestions, reactions?
Cheers
Andrea

-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Geotools-devel mailing list
Geotools-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Reply via email to