Thanks to everybody who has replied.  It appears I need to do some
homework on MS SQL.  :)

-----Original Message-----
From: Mark Kruger [mailto:mkru...@cfwebtools.com] 
Sent: Tuesday, May 26, 2009 11:03 AM
To: cf-talk
Subject: RE: Parsing large CSVs and inserting in MSSQL


Jason,

This is not what you want to hear but....

My advice would be don't use CF to do this. DTS (or SSIS) is quite
capable
of doing this in about 1/30 the time that CF is able to do this. One of
our
DTS packages consumes a file of over 300k rows in less than 15 seconds
every
few minutes. You can't get that with CF no matter how hard you try :) 

But if you insist on doing it with CF, don't use CFFILE. Instead read
the
lines in one at a time using a line reader technique - that will allow
you
to handle one line at a time instead of reding the entire file into
memory.

-Mark

Mark A. Kruger, CFG, MCSE
(402) 408-3733 ext 105
www.cfwebtools.com
www.coldfusionmuse.com
www.necfug.com

-----Original Message-----
From: Jason Durham [mailto:jdur...@cti-stl.com] 
Sent: Tuesday, May 26, 2009 10:53 AM
To: cf-talk
Subject: Parsing large CSVs and inserting in MSSQL


I have been tasked with providing a mechanism to upload/parse/perform a
couple of checks/save CSV data.  The current test file is 28,000 records
(~16 columns).  I expect this file to grow in size each month, to an
indeterminate number (hundreds of thousands).

 

It seems that I'll need to set a limit on the number of rows being
processed at each time.   The two obvious options (in my eyes)... 

 

#1 break the CSV up into several smaller CSV files and write them back
to
the file system before processing each individually (all checks
performed in
memory)

#2 set a large timeout value and attempt to load all of the records from
the
CSV into a temp table, retrieving back smaller, more manageable record
sets

 

Currently, I'm doing #2 but not limiting the number of records (28,000).
The data is getting into the temp table, being retrieved, but the final
insert is causing ColdFusion to hang.  The final insert consists of one
cfquery with an internal loop over the "temp table" query object.  The
final
insert is within a single transaction so I can rollback if it fails.

 

This process will only take place on my development machine, connecting
to
the live database.  2.6 Core 2 Duo, 4GB, Vista 32 with a 100MB
connection to
production server running MS SQL 2005 Express.

 

Perhaps I need to take a step back and consider alternatives.  Btw, the
"checks" I'm performing are simply that 2 of the columns do not contain
a
unique value (FKs).  I'm currently doing this with QoQs but I could
probably
push this off with a fairly complicated query to MS SQL.  What
do the CF wizards suggest?   

 

TIA







~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Want to reach the ColdFusion community with something they want? Let them know 
on the House of Fusion mailing lists
Archive: 
http://www.houseoffusion.com/groups/cf-talk/message.cfm/messageid:322792
Subscription: http://www.houseoffusion.com/groups/cf-talk/subscribe.cfm
Unsubscribe: http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=89.70.4

Reply via email to