If the process now takes 4 hrs instead of 4 hrs 25 secs, then it is not "a lot faster". Even if it's 4 mins 25 secs vs 4 mins, you've not done much (less than 10%). What is the percentage improvement? (Or, how fast is it now vs how fast it was?)
I would guess that you could drastically improve the performance, as you did not mention having done any of the SQL-related things -- having nothing to do with your choice of datatypes for your collection items -- that could make things go (much) faster. In my view, if you did not use a profiler to figure out where the bulk of the time in your code is going, you may well have optimized something that took only a small percentage of the total run time -- even if you reduced the time of some part that used to take (say) 15% of the total time to 0 (not likely!), you would not improve the overall timing by more than 15%. I would guess that any of these would make the "insert rows in database" part faster: (1) You mentioned a SQL property of your collection class. If declared as a normal (instance) property, it should be coded to return a value stored in a class-level (static) variable. Otherwise, you're (conceptually) creating 22,000 copies of the same SQL statement! (Due to CLR internal smarts, you might "only" be creating 22,000 pointers to the same SQL statement. Either that, or each instance's SQL is different because you're embedding the data in each individual SQL statement -- then you have to worry about SQL injection, not to mention wasting a lot of time stuffing the text in.) But you should be getting the SQL statement only once and using parameters (see 2), so having it be a static property (or a static GetSQL method) makes more sense. (2) You did not indicate that you used a prepared statement that was run once for each record after setting parameter values from the current in-memory collection item. If you are not doing that, that technique would likely make things quite a bit faster (and avoid SQL injection issues). (3) You did not mention what, if anything, you are doing related to SQL transactions. If you are doing things one record at a time without explicitly using transactions, there is an implicit "start transaction / end transaction" around each insert. If you are starting a transaction, inserting all 22,000 records, and then committing, you are putting a burden on the server as it needs to be able to roll back. (But that's the only way to make 100% sure that you don't add some but not all of the records.) Optimal performance is typically achieved, I think, by using a separate transaction around medium-sized (perhaps 200-500 or so) groups of inserts. (4) If you're using .Net 2.0, look at the SqlBulkCopy class; I haven't looked at it carefully, so I don't know if it's useful or not for your situation. (5) If you are working with SQL Server, you might achieve an enormous speedup by writing a text file and using the "bulk insert" statement to suck the data in all at once. (Even if you need to do validation on the contents of the data file, perhaps you could pass the raw input file to "bulk insert", write the results to a temporary table, then use SQL to do the validation. For example, if the file has an initial header that gives the sum of all the values in one column, you could have "bulk insert" skip that record, use "select sum(totalcol) from #temptbl" to get the sum, and read (only) the header in your program to check against that result. It would take an incredibly small amount of time for such a SQL statement to run against a temp table.) Unfortunately, using "bulk insert" is often hard to set up in a locked-down production environment, as you need to either have the text file exist on the server (when you probably should not have a way to write to files on the SQL Server machine) or to have both the server and every machine running the app you're writing able to access the same file over the network (when the SQL Server process would normally not need to have network access). (6) If you're using SQL Server, you can create an XML document (with as little XML overhead as possible) of the form <root> <r f1="value" f2="value" f3="value" .../> <r f1="value" f2="value" f3="value" .../> (21,198 more like that) </root> and use sp_xml_preparedocument and OPENXML on the server to "shred" the XML document into a table variable returned from a table-valued function that receives the result of sp_xml_preparedocument as a parameter. Note that the names of the attributes in the XML do not need to be the actual column names -- one-letter names (a b c etc) would be even better than the f1 f2 f3 etc in that sample, to shorten the XML. You can write a stored proc similar to this code, or use a multi-statement SQL command like declare @doc int exec sp_xml_preparedocument @doc output, @xml insert yourtable(f1, f2, f3...) select * from tablefn(@doc) exec sp_xml_removedocument @doc Then you pass the @xml value using a parameter of type TEXT (with a large enough Size). You can break the 22,000 records into chunks if you want to and run that repeatedly with the XML for a few hundred records at a time. (You can't call the XML stored procs from within a function, so the function has to take the document id rather than the XML text as parameter.) Warning -- if you use that technique, you have to use XML character entities like < > " to replace <>" chars in any text columns. In both cases 5 and 6, don't neglect to set the Capacity of the StringBuilder you use to hold the text/xml document you create to a moderately large value (e.g. 65536) to avoid doing more re-allocations than are needed as more text is added. (Don't even _dream_ of using + to build the text!) There is an oft-quoted hard-to-attribute thought (because it's been said by many and I don't know who said it first) Premature optimization is the root of all evil and optimizing in-memory operations (like with class vs struct) when your database actions are what's really slow is "premature" (vs optimizing the database work). Good luck!!! At 12:45 AM 1/13/2006, Girish Jain wrote >Hi, > >Thanks to all of you for your valuable inputs. I did some performance >testing and the results are surprising. > >The new type holds all the data to be inserted in to the database (in the >local class variables) and has a property namely SQL which returns the >INSERT SQL statement. By iterating over the items of the collection using >for each loop I am inserting the records into the database. For reading the >lines from the file I am using regular expressions. > >When I made the new type as value type and with some 22,000 records it took >approx 25 seconds less time. With several tests, this time ranged between 23 >to 25 seconds. Therefore, with value type the performance is really good. > >Now, if somebody can throw some light on this AND highlight the criterias >(not generic ones) that I should always be considering for value types or >reference types with regards to **memory** usage, because every time I dont >want to decide the type after doing this performance testing. > > >Regards, >Girish Jain >[snip] J. Merrill / Analytical Software Corp =================================== This list is hosted by DevelopMentorĀ® http://www.develop.com View archives and manage your subscription(s) at http://discuss.develop.com