>  Aside from the slowness of any networked file system protocol,
>  AFS and NFS both choke at the 2 Gb barrier.
>
>Besides which, you'll really thrash your cache if you try to access that
>file sequentially through afs.
>
>  BTW, what made you think RAID is necessarily slow?  It can be
>  if poorly implemented or if you can't select simple striping
>  without parity drives.
>
>I guess I don't think of it as RAID without the parity drives.  But even
>with full redundancy, reading data from a proper RAID should be fast, it's
>writing that will be slow.  Depending on what you use the files for, this
>might be ok in some applications (unchanging census data, for example).
>
>In fact, I think the answer to your question really depends on the
>application.  If you always access that 6 Gb file read-only and
>sequentially, then a simple streaming tape drive or optical storage might be
>best.  If it's read-write random access, you'll have to go with some kind of
>disk striping, I think.
>

Having worked on multi-gigabyte datasets for awhile now, I have some
comments:

1) Filesystems that optimize for volumes less than a few gig (such
as AFS) cause a lot of problems for people with large datasets.

2) Things we consider to be large datasets today (gig range) are
gonna be ho-hum tomorrow.  We have to be working towards software
systems solutions that don't unecessarily limit dataset sizes.

3) As several people have pointed out, applications can, and maybe
should be written not to rely on single filesystems.  My primary
reason for this statement is reality, which is that most 32 bit
machines are going to have either 2 or 4gig limits on a single file.
64 bit machines may eliminate this problem for awhile.  (*:

In working with datasets that are read only/sequential access, I
designed and implemented a system to provide a high speed, distributed
system for handling queries, extractions, and tabulations.  I can
take an extraction query for a dataset, compile the query, execute
the query on distributed machines, all with a goal towards getting
results in as few machine cycles as possible.  Using 2 gigabyte Census
datasets, we are able to get tabulations within a few seconds, which
is fantastic for real time interaction with data.

If anyone happens to have any interest in that type of thing, I
can provide more details via e-mail.

Paul Anderson
[EMAIL PROTECTED]

Reply via email to