On Mon, Jan 18, 2010 at 05:46:34AM +0100, Peter Stuge wrote:
> sascha wrote:
> > the test values are not correct. the seeker utility used
> > the current number of seconds as the seed for rand() and so the
> > processes used the same random offsets. the number is 10000
> > seeks/second on all 16 usb drives when reading from /dev/sd*,
> 
> Nice! Is that with 512 byte read between seeks? If not, it's a lot
> less than 1/10 of high-speed USB performance. If with reads it might
> improve still if there were more devices. (But each USB can only have
> 127 including hubs.) The USB interface chips in the memory sticks
> could be the limiting factor for you.

the block size is 512. smaller sizes bring no benefit with the current
access methods tested (mmap, non blocking io, normal io), 512 is also
the smallest possible transfer size of files opened with O_DIRECT.
A 790Mchains table uses 75 bytes in a index region on average which means
that either less ram would be needed for the index (coalescing 4 adjacent
regions and loading only 25% of the index values into ram) or that the start
values and end values can be combined in a single file and no extra access
is needed if a match is found.

> 
> The one transfer per frame limit applies only to interrupt USB
> transfers, and USB storage devices use either a control/bulk/interrupt
> protocol, or a bulk-only protocol. I discovered that only floppy
> drives use the CBI interface. Many bulk transfers can go in one
> frame if the host controller implements it. One SCSI command transfer
> and one response transfer is needed per seek. Max 512 bytes data per
> packet but SCSI protocol overhead means that each data block will
> always be two data packets (=more overhead).

Since USB concurrent access does not slow down the transfers compared to
a single usb device, either more than one USB transaction is in flight on
the bus or the storage device transaction is split into multiple usb
transactions. i guess.

> 
> 512 bytes is fairly large data blocks so the overhead could perhaps
> be an easy tradeoff for the genericness and availability of memory
> sticks.
> 
> How many instructions are needed to process each block before the
> next seek? 10? 100? 1000? Can processing be done in parallel also?

the positions to which to seek come out of the chain generator at a rate
between 10k and 20k.

> 
> 
> > 800 when reading from a 64GB LVM2 logical volume. And then 6000
> > when seeking in the files on 3 partially filled LVM2 volumes. The
> > reason for the last 2 timings is not yet clear to me.
> 
> Interactions between the different layers combined with scheduling
> would be my guess.
> 

The aswer to this problem was that on a raid of 4 devices, you need 4 threads
and you have to make sure that your random numbers are sorted in a way that
you access the devices in a round robin fashion. LVM is also no bottleneck.
The timing on the LV device file:
msecs   seeks/s usec/seek
14066   710     1406
14321   698     1432
14291   699     1429
14320   698     1432
14322   2792    358

A file on the filesystem:
msecs   seeks/s usec/seek
14627   683     1462
14646   682     1464
14967   668     1496
14590   685     1459
14978   2670    374

for the SSD:
msecs   seeks/s usec/seek
2886    3464    288
2886    3464    288

the last line is always the summary for the threads above.
The SSD is 32gbyte, the raid is 64gbyte, so for the usb setup to be faster
you would have to use 4GB sticks (which are also cheaper than 16gbyte) and
and 3 host controllers to access the 256 devices + hubs.
cost factor: 2Eur/GB for 32gbyte sized SSD
1,2Eur/GB for 4gbyte sized usb flash

whether 256 devices are even possible in reality is another question.
64 devices per bus would already saturate 32mbyte/s usb speed. (512 * 1000
bytes per device per second)
_______________________________________________
A51 mailing list
[email protected]
http://lists.lists.reflextor.com/cgi-bin/mailman/listinfo/a51

Reply via email to