It is more that shapefile does not offer a database session, so we are
faking it to make the editing story easier for desktop clients.
Using AUTO_COMMIT is a terrible idea as it will involve writing out your
file many times (ie each time you add a feature).
I tried to indicate a better way in my email, and in the docs, but it is
not coming through.
On Fri, Sep 6, 2013 at 11:29 AM, William Voorsluys <[email protected]>wrote:
> Hi Jodi,
>
> Did you mean to reply this to the list?
>
> It seems clear that transactions are not meant to be used efficiently
> on shapefiles. I'm settling on using AUTO_COMMIT and writing a feature
> a time using a writer. Do you mean there is no better way of more
> efficiently writing features in bulk to the file, say 1000 at a time?
> It seems the operation of getting an append writer is the greatest
> bottleneck in the operation.
>
> Will
>
> On Fri, Sep 6, 2013 at 1:28 AM, Jody Garnett <[email protected]>
> wrote:
> > I really need a better way to communicate this one, or a special case
> when
> > the shapefile is empty or something.
> >
> > The goal here is to use an append feature writer, directly, and write the
> > content out as you go in a streaming fashion.
> >
> > This is what ShapefileDataSource does internally when you call
> > transaction.commit(). It goes through the changes that it has collected
> in
> > memory and writes out a new file. It then renames the old file out of the
> > way, renames the new file into the correct place, and deletes the old
> file.
> >
> >
> >
> > On Thu, Sep 5, 2013 at 8:15 PM, William Voorsluys <[email protected]
> >
> > wrote:
> >>
> >> Dear All,
> >>
> >> I've been trying a few solutions to efficiently convert GeoJSON into a
> >> shapefile without having to store all features in memory. I'm using
> >> GeoTools 9.2.
> >>
> >> The problem is not so much in how to stream the JSON but how to
> >> efficiently write the features into the shapefile. I use
> >> FeatureJSON#streamFeatureCollection to obtain an iterator. After some
> >> googling, I found 3 different ways of writing a shapefile, namely:
> >>
> >> 1. Repeatedly calling FeatureStore#addFeatures with a collection
> >> containing say 1000 features, within a transaction.
> >> -----
> >> ListFeatureCollection coll = new ListFeatureCollection(type,
> >> features);
> >> Transaction transaction = new DefaultTransaction("create");
> >> featureStore.setTransaction(transaction);
> >> try {
> >> featureStore.addFeatures(coll);
> >> transaction.commit();
> >> } catch (IOException e) {
> >> transaction.rollback();
> >> throw new IllegalStateException(
> >> "Could not write some features to shapefile. Aborting
> >> process", e);
> >> } finally {
> >> transaction.close();
> >> }
> >> -----
> >>
> >>
> >> This option is extremely slow. By profiling a few runs, I noticed that
> >> about 50% of CPU time is spent on the method
> >> ContentFeatureStore#getWriterAppend, presumably in order to reach the
> >> end of the file before each transaction commit.
> >>
> >> 2. Obtaining an append writer directly from ShapefileDataStore, and
> >> write 1000 features at a time within a transaction.
> >>
> >> This options suffers from the same problems as number one.
> >>
> >> 3. Obtaining a feature writer from ShapefileDataStore, and write one
> >> feature at a time using Transaction.AUTO_COMMIT.
> >>
> >> -----
> >> FeatureWriter<SimpleFeatureType, SimpleFeature> writer =
> shpDataStore
> >> .getFeatureWriter(shpDataStore.getTypeNames()[0],
> >> Transaction.AUTO_COMMIT);
> >>
> >> while (jsonIt.hasNext()) {
> >>
> >> SimpleFeature feature = jsonIt.next();
> >> SimpleFeature toWrite = writer.next();
> >> for (int i = 0; i < toWrite.getType().getAttributeCount(); i++) {
> >> String name = toWrite.getType().getDescriptor(i).getLocalName();
> >> toWrite.setAttribute(name, feature.getAttribute(name));
> >> }
> >> writer.write();
> >> }
> >> writer.close();
> >> ----
> >>
> >>
> >> Option 3 is the fastest, but I feel there would a way of efficiently
> >> adding a greater number of features at a time to the shapefile within
> >> a transaction. On the other hand, a previous comment in this lists
> >> noted:
> >>
> >> > The above would work for mid-sized data transafers, for massive ones
> >> > against
> >> > databases it's better to adopt some sort of batching to avoid having a
> >> > single
> >> > transaction with one million inserts, e.g., insert 1000, commit the
> >> > transaction,
> >> > insert another 1000, and so on.
> >> > This would work better against databases and against WFS servers,
> >> > but not against shapefiles, which instead work better with the massive
> >> > insert...
> >> > to each his own.
> >>
> >> Does this mean that the most efficient way of writing to a shapefile
> >> is having all features in memory, rather than being able to append
> >> features?
> >> I appreciate if someone could suggest a better way of achieving this
> >> or point to any documentation that would help me.
> >>
> >> Best regards,
> >>
> >> Will
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> >> Discover the easy way to master current and previous Microsoft
> >> technologies
> >> and advance your career. Get an incredible 1,500+ hours of step-by-step
> >> tutorial videos with LearnDevNow. Subscribe today and save!
> >>
> >>
> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
> >> _______________________________________________
> >> GeoTools-GT2-Users mailing list
> >> [email protected]
> >> https://lists.sourceforge.net/lists/listinfo/geotools-gt2-users
> >
> >
>
------------------------------------------------------------------------------
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk
_______________________________________________
GeoTools-Devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/geotools-devel