Re: Efficient loading of calculated data

Rick Hillegas Wed, 12 Dec 2007 17:36:06 -0800

Stavros Macrakis wrote:

Hi, I have an application whose output is about 500,000 pairs (string,
integer) -- this is the result of some fairly fancy text processing.
I'd like to put this data into a (new) Derby table. Using individual
Inserts for each row takes over an hour, which seems much too long.
Using the bulk import feature involves writing out to a file and then
importing from the file, which seems rather roundabout.


So... What is the recommended way to insert a large number of rows
from an application? Is the answer the same for 10^3 or 10^8 rows? Do
the data types involved (e.g. large text field with newlines) make any
difference to the answer?

Thanks,

         -s

Hi  Stavros,

If you can wait a couple months before you deploy this to production,then you can use the Table Function feature which is implemented in thedevelopment trunk and which will be exposed in the 10.4 release in thefirst quarter of next year.

Table Functions let you make arbitrary external data sources look liketables to Derby. Basically, any data source which you can wrap in aResultSet can then be treated by Derby as though it were a table. Thisfeature is described in the 10.4 Developer's Guide in the section titled"Programming Derby-style Table Functions":http://db.apache.org/derby/docs/dev/devguide/ A slide deck explaininghow to use this feature can be found athttp://people.apache.org/~rhillegas/vtiDemo/doc/saucerSeparation.html Auseful toolkit is checked into the development trunk in thejava/demo/vtis subdirectory. Please read the README file. I think youcan solve your problem easily by extending StringColumnVTI.

Once you have written your table function and declared it to Derby, youcan then bulk import your external data stream as follows:


insert into MY_TABLE
select * from table( MY_TABLE_FUNCTION() ) s

Hope this helps,
-Rick

Re: Efficient loading of calculated data

Reply via email to