> VAC eg. is good for archiving, but it's tree-based structure 
> is probably not optimal for streaming (on large files, a lot 
> of blocks IMHO have to be loaded before getting the first
> payload block can be reached). 

A typical venti tree has a branching factor of 409 (8192/20).  
For a 1GB file, that means you have to load two extra blocks to find
the first one, and 322 interior blocks to find all 131,072 data blocks.
Is improving that 0.2% really your justification for a less capable
data structure?

> Some kind of linked list, eg. like in CMD-1541 filesystem
> (each block as an pointer to the next one) or an linked list
> of index blocks would fit better.

For a 1GB file, you'd need 20*131,072 = 2,621,440 bytes of
storage to hold the pointers.  No matter where you put them,
your entire file is now (1GB+2,621,440)/8192 = 131,392 blocks.
The tree was using 131,394 blocks.  So at best, the linked list 
has reduced the number of block loads by 0.002%, and you've
given up random access, including streaming starting halfway
through a file.  Doesn't sound better to me.

Venti's performance is dominated much more by fragmentation
in where the blocks are laid out in the arena logs (that causes seeks)
than anything in higher level data structures.  There is a paper about
this in the upcoming Usenix.  See http://swtch.com/~rsc/papers/
for a link to PDF and HTML.  (Because the paper is targeted at a
non-Plan 9 audience, "Venti" in that paper refers to venti as described
in the original paper.  The current venti sources implement all the 
improvements described as "Foundation" in the paper.)

Russ


Reply via email to