Re: [Geotools-devel] Rewriting the directory data store

Andrea Aime Thu, 11 Dec 2008 14:40:03 -0800

Justin Deoliveira ha scritto:
> Sounds like a good idea to me. Additional thoughts and comments inline.
> 
> Andrea Aime wrote:
>> Hi,
>> I've been looking at the directory data store again and
>> the more I delve into it, the more I believe a rewrite
>> is the only way to go.
>>
>> The current one tries to delegate to secondary data stores,
>> relies on those datastore factories to implement
>> a specific interface, does not support namespaces (crucial
>> to have GeoServer use the datastore), assumes one feature type
>> per child datastore, has serious caching issues, does not
>> deal properly with datastore disposal and... well... shall
>> I go on?? ;-)
>>
>> So I've been thinking about DirectoryDataStore v2.
>>
>> What it should do:
>> - given a directory, find all the feature types stored inside it, in
>>    whatever format
>> - the directory might contain files that do store more
>>    than one feature type and datastores that might catch more than one
>>    file (shapefile, property ds), that should be handled gracefully
>> - eventually handle recursive scan (but only as an option, it might be
>>    expensive)
>> - support proper namespace setting
>> - the user should not be concerned with the native datastore serving a
>>    certain file, yet it could be useful to have access to it on occasion
>>
>> How:
>> - take a directory, a namespace (eventually a recursion flag)
>> - scan all the files in the specified directory
>> - get rid of all the file data store assumptions, just look for
>>    datastores that can open a certain URL (the current file) with a
>>    certain namespace and load all the feature types that are inside of
>>    them
>>
> How will this work? How can one ask a generic DataStoreFactorySpi if it 
> can handle a file? Which I thought was the entire reason for 
> FileDataStoreFactorySpi.


The FileDataStoreSpi is badly designed for a number of reasons:
- it assumes you deal with certain extensions. This is overrated, the
   datastore could operate by inspecting magic numbers in the file header
   instead of using extensions
- it assumes you can create a datastore with a url alone. Nope, we want
   a namespace too
- it assumes there is only one feature type per url. Nope, a file can
   be complex and contain many layers

Since I wanted to work on this both in trunk and 2.5.x I tried hard
to stay away from any mandatory new interface and work with what's 
already there. Any datastore dealing with files has at least to
be able and handle a url and a namespace, so that's just as much
I'd like to assume.

What would be useful, but I don't consider mandatory, is a FileDataStore
interface that provides a list of files captured by a certain datastore,
so that I can avoid scanning them. I can workaround the lack of it
by forcing the registration of a single feature type just once, with
the first datastore that can handle it. When I open a file, I'll
grab all the feature types inside of it, in the case of property data
store for example, that will mean all the property files in the
current directory (that's how property data store works if my memory
serves me right).

>> Issues:
>> - a certain datastore can open multiple files (shapefile, property ds),
>>    we want to avoid keeping duplicate datastores around
>> - a directory (or worse, a tree) can hold a massive amount of feature
>>    types, there are legitimate scalability/memory consumption concerns.
>>
>> Using a lightweight (soft reference based) cache has issues with
>> datastore disposal, as the datastore we're trying to dispose might be 
>> the holder or a resource that  might be in use by a reader or a feature
>> source, closing it might kill the current user...
>> This one is hard actually, the api does not give us any clue on
>> whether a datastore generated object is still being used or not...
>> To avoid it we'd have to keep strong references to all datastores
>> that have returned a feature source, a reader or a writer at
>> least once. Maybe we can add a custom api to this datastore to
>> force some resource release (stop gap measure for the lack of a better
>> way).
>>
> Yeah tricky. It seems to me what we lack is a dispose on FeatureSource. 

Hummm.. yeah, but I fear this will escalate badly codebase refactoring wise.
If we add that, we'll also have to start keeping references to all 
resource holders provided to the clients
inside the DataStore itself. This accounts not only for FeatureSource,
but also for collections, subcollections, readers, iterators, writers... 
given the variety of ways a datastore has been created in the past
years, the mind boggles at the thought.
I'd say we first simplify a little bit the datastore interfaces (die
feature collection, die!) and then we can think hanging ourselves
with nice resource tracking?

Cheers
Andrea


-- 
Andrea Aime
OpenGeo - http://opengeo.org
Expert service straight from the developers.

------------------------------------------------------------------------------
SF.Net email is Sponsored by MIX09, March 18-20, 2009 in Las Vegas, Nevada.
The future of the web can't happen without you.  Join us at MIX09 to help
pave the way to the Next Web now. Learn more and register at
http://ad.doubleclick.net/clk;208669438;13503038;i?http://2009.visitmix.com/
_______________________________________________
Geotools-devel mailing list
Geotools-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/geotools-devel

Re: [Geotools-devel] Rewriting the directory data store

Reply via email to