Yes I think it's now clear we need more buffer space to avoid
bottlenecks for high iops. The initial design kept it simple with the
1MB vmalloc'd space but anticipated greater would be needed. It should
not be necessary to change userspace or the TCMU ABI to handle growing
the buffer for fast devices:

1. increase the region mmap()ed by userspace, TCMU_RING_SIZE, from 1MB
to 1GB or larger
For the cmd area, set the size to fixed 512M, and data area's size to
fixed 1G, is that okay ?

512M seems a little big for the cmd area... use of the cmd ring should be much smaller than the data area. Each cmd has a minimum size, but then to describe an additional page in the iovec[] should be just one struct iovec (16 bytes?) for each 4K page.


The struct tcmu_cmd_entry {} size is fixed 44 bytes without iovec[], and the size of struct iovec[N] is about 16 bytes * N.

The cmd entry size will be [44B, N *16 + 44B], and the data size will be [0, N * 4096], so the ratio of sizeof(cmd entry): sizeof(entry datas) == (N * 16 + 44)Bytes : (N * 4096)Bytes == (N * 16)/(N * 4096) + 44/(N*4096) == 1/256 + 11/(N * 1024). When N is bigger, the ratio will be smaller. If N >= 1, the ratio will be [15/1024, 4/1024).

So, that's right, 512M is a little bigger than actually needed, for the safe ratio i think 16/1024 is enough, if the data area is fixed 1G, so the cmd area could be 16M+.
Or cmd area(64M) + data area(960M) == 1G ?


3. Upgrade the current fixed-size bitmap-based tracking of data area
to handle the new scheme
The Radix tree will be used to keep the block's index(0 ~
1G/DATA_BLOCK_SIZE) and physical page mapping relations. Each leaf is
one data block(the size is DATA_BLOCK_SIZE).

For non-leaf nodes, use the radix tags[0][SLOTs] to indicate wether
slot[SLOTs]'s branch has free(reused the old one or NULL leafs) block
leafs or not.
This could speed the search of the free blocks in data area.

Yes I think radix tree is a good place to start.

Okay.

Thanks,

BRs
Xiubo


Regards -- Andy




Reply via email to