I'm working with very large data sets. In my current problem, I have 
about 74,000 records that need to be converted, filtered and stored in 
Oracle database. The initial run will has the largest query results. 
After this, the data will change only on a very limited basis.

The source database is Microsoft Server SQL 7 and the destination 
database is Oracle database running on a Solaris. I've been creating a 
modest program in Java to do the conversion.

The current design uses a Row Gateway pattern from 
http://martinfowler.com/isa/index.html. This is basically a static class 
that creates a collection of value objects. I've been surprised at how 
fast this working. I have the following code in my main routine

start = System.currentTimeMillis();
CDPubsList pubs = new CDPubsList();
List pubsList = pubs.getCDPubList();
for (Iterator iter = pubsList.iterator(); iter.hasNext();) {
CDPub element = (CDPub) iter.next();
System.out.println(element);
}
finish = System.currentTimeMillis();

It typically takes about only thirty seconds. I think some of the time 
is cause by the System.out.println and my log4J debug statements. My 
problem is that I ran out of memory in Eclipse. I fixed the problem by 
upping memory in Eclipse. The issue is that I know that this is one of 
my smaller data sets.

I was going to replace this with a CachedRowSet, but I just read in the 
JDBC API Tutorial and Reference, 2nd ed that

"CachedRowSet - ... [is] not suitable for very large data sets..."

I could switch my design to have the main loop read one record at a 
time. I have several reservations about this design. First, it is that 
not very object-oriented. The core code knows all about my database. 
Since I have to create several of this conversation programs over the 
next year, I wanted the core code to be like a simple framework or 
harness that I could reuse over and over. I lose the reusable components 
that I'm creating. I would either have to have the connection open all 
the time or I would need to add a connection pool manger - such as DBCP 
from Apache.

I could grab portions of the data set, I think this would be called 
paging, but I just can't see how to do this without requiring an 
alteration to the source database schema. I could do this via view, but 
I rather not having to do this.

I figure I must be just missing something obvious. If you don't mind, I 
was hoping that someone have some pointers...


_______________________________________________
MVC-Programmers mailing list
[EMAIL PROTECTED]
http://www.netbean.net/mailman/listinfo/mvc-programmers

Reply via email to