I've been working on a solution to the problem of working with very large
datasets in OpenJUMP at home the past couple of weeks. (For those of you
that don't know, OpenJUMP reads all features in from a data source into
memory. This isn't a problem until you start working with some very large
datasets. For example, OpenJUMP runs out of memory before it can open the
shapefile with all of the parcels in my county. The size limit of the data
source OpenJUMP can work with is limited by the RAM of the computer OpenJUMP
is running on.) I'd like to give a brief explanation of how this system will
work, and then ask for some suggestions on an aspect of the design.
This system uses a very light-weight in-memory representation of the Feature
class. (This is required because portions of OpenJUMP's code requires the
ability to manipulate individual features or all the features in a feature
collection "in-memeory".) Object's of this light-weight Feature Class are
really a façade and forward all method calls to a FeatureCache object. A
FeatureCache is an implementation of the FeatureCollection interface that
actually manages data behind the light-weight Feature objects.
The FeatureCache maintains a "buffer". In this buffer it stores in-memory
representations of regular OpenJUMP Feature objects. This buffer will only
grow to a maximum size that can be set by the user and based on the balance
between speed/performance and memory usage. When a method call is made to
the light-weight Feature object it is forwarded to the FeatureCache. The
FeatureCache passes this call to the regular Feature object if it is in the
buffer. If it is not in the buffer the Feature object is created in memory
from information in permanent storage or "on-disk". The method call is then
processed and the newly created Feature is placed in the buffer. If the
buffer is already at its limit the oldest Feature in the Buffer is stored
back in permanent memory and removed from the buffer.
There should be no major distinction between Features and a
FeatureCollection implemented by a FeatureCache and normal Features and
FeatureCollections that are stored entirely in memory. The only significant
difference will be the speed of operations and rendering. This will be
slower with this system than it is with Features and FeatureCollections
stored entirely in memory. However, it will make it possible to work with
very large datasets.
Here is the part of the system that I would like to get some suggestions on.
I need to decide on a storage format for the features placed in permanent
memory, or on disk. I think I have 3 choices.
[1] Java's Standard Object Serialization Format
[2] A custom binary storage format.
[3] A text based format.
I believe the first two formats will be much quicker than the third. I don't
really think the second format is something I want to do, because I think
cooking up a custom binary format will be a real pain in the neck. So I need
to decide between the first format listed and the third format listed.
If I use a text-based format external tools will be able to easily work with
the FeatureCache, and I won't have to worry about versioning issues. It will
also be slower. If I use Java's standard object serialization format I'll
have better performance, but I'll have to worry about versioning issues that
might come up if we change the interface definition for the Feature
interface. It will also make it difficult for external tools, especially
those that aren't written in Java, to work with the data in the
FeatureCache.
I'd like to know what storage format the other developers would recommend
and why.
Thanks,
The Sunburned Surveyor
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Jump-pilot-devel mailing list
Jump-pilot-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/jump-pilot-devel