I am not certain where you read that the Stream classes do buffer for
you, but I have never seen that specified in any of the help files.
While I won't rule out that there is a Stream derived class that does
some buffering for you, I will state with certainty that the ones I
use do not.

I use BinaryReader and BinaryWriter all the time to bang against flat-
files.  They don't read or write anything more than the precise bytes
I specify.  I had to implement bulk reading and writing of records to
get acceptable performance, otherwise each read of a record was a
separate hit on the hard drive.  Without doing my own buffering,
reading a thousand+ records would grind my whole process to a slow
crawl.

I agree that reading a large amount into RAM is not advisable, but how
much is a large amount?  300,000 records sounds large, but if each
record is, say, 64 bytes then the total memory required is only
300,000/ 16  = 18,750 K = a little over 18 MB plus some overhead.  If
that is the case, and your process is running on a computer with lots
of RAM (say 2GB), I'd say read the whole file and be done with it.
If each record was 1KB, then definitely don't read the whole file in
at one time.

On Feb 12, 9:23 am, "Del N. Quent" <[email protected]> wrote:
> A couple things ...
>
> 1.  there was a time I needed to get data from files into a
> database ... I think I've done it for both SQL Server and Oracle ...
> for SQL Server, they've got an .exe called something like Bulk
> Copy ... anyhow, if you google that, you should find it ... it worked
> quite well ... of  course, part of the reason for that is it bypasses
> any triggers ... also, possibly foreign-key constraint checking ...
> and such ... but it's VERY fast ... wouldn't take much to write a new
> program that simply launches the Bulk Copy pointing at the appropriate
> file
>
> 2.  I'd be extremely surprised if one of the Stream classes used for
> file I/O didn't already buffer blocks ... in fact, while double-
> checking, I've even found a comment saying as much ... I'd also avoid
> reading ALL records into RAM ... sometimes users run programs which
> use a lot of it ... and they get cranky if your program adversely
> affects the performance of another one (though I have no experience
> with such things :-0 ) ... anyhow, if you're using a Stream class for
> file I/O, chances are extremely good appropriate buffering is already
> being done for you ... so no need to *cache* up n records ... one at a
> time is fine
>
> 3.  I abstract all my Db access ... part of that is any Execute is
> done from a modal form (for attended programs, of course) ... this
> keeps the user from attempting to mess with whatever form might
> otherwise be open ... anyhow, if you're going to spawn a new thread
> for such a thing, I'd highly recommend having that thread create and
> show a new Form ... on this form, you can have a status label (I use
> textboxes so the user can copy/paste anything they want to
> question) ... and a progress-bar ... of course, as it's a new thread,
> IIRC, even if the new Form is modal, your other Form will still be
> usable ... at which, you'll probably want your file-open dialog from
> the new thread ... along with anything else related to that particular
> import operation ... if you design it right, the user should be able
> to *kick-off* any number of these ... all running at once
>
> 4.  oops ... almost forgot ... your database insert should use
> parameters ... do NOT embed actual values in the SQL ... this way, the
> database doesn't re-optimize the SQL every time ... it's been a while,
> so maybe the newer servers actually figure that out for you ... either
> way, it's still a good way to go ... I usually try to keep the same
> Command object alive and change its parameters for every execution ...
> I'm using SQL 2000 'er such ... and Oracle 7.3 ... and still getting
> excellent performance, that way
>
> 5.  oh ya ... and while I'm at it ... I usually use .BeginTrans
> and .Rollback/.Commit ... however, when you're talking in excess of a
> few hundred rows, you run the risk of blowing out your rollback
> segment ... so I'd use the aforementioned "n % 200" 'er such to do
> a .Commit and a new .BeginTrans ... if you don't use any at all, I
> believe the default behavior is an *automatic* .BeginTrans/.Commit for
> EVERY row ... I couldn't say if that might be more or less efficient
> than *caching* 'em up every few hundred ... after all, the database
> still needs to .Commit n rows ... whether they're one at a time, or
> not ... regardless, that's what I'd do ... again, just because YOUR
> program ain't blowing up, doesn't mean someone ELSE's isn't suffering
> because of poor use of such resources
>
> On Feb 9, 2:23 pm, rbdavidson <[email protected]> wrote:
>
>
>
> > You will get MUCH better performance if you read blocks of records
> > into memory, then process all the record in that block.  This results
> > in far fewer seeks and better data throughput.  For example, if you
> > read 1000 records at a time, for 300,000 records you only hit the hard
> > drive 300 times.  300 *.01 avg seek time = 3 seconds instead of 3000
> > seconds.
>
> > Depending upon the size of the file and available RAM, you might well
> > want to read the entire file before you start processing records.
>
> > -- R.B. Davidson
>
> > On Feb 8, 1:02 am, "Joe C." <[email protected]> wrote:- Hide quoted text -
>
> - Show quoted text -

Reply via email to