Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-03-01 Thread Xiubo Li




For now we will increase the data area size to 1G, and the cmd area
size to 128M. The tcmu-runner should mmap() about (128M + 1G) when
running and the TCMU will dynamically grows the data area from 0 to
max 1G size.

Cool. This is a good approach for an initial patch but this raises
concerns about efficiently managing kernel memory usage -- the data area
grows but never shrinks, and total possible usage increases per
backstore. (What if there are 1000?) Any ideas how we could also improve
these aspects of the design? (Global TCMU data area usage limit?)


Sorry for misunderstanding about this on my part before.

If we couldn't get a feasible way from mm to deal with the memories
shrinking. Maybe a global TCMU data area usage limit is a good choice:

We can limit the global physical data area size to 2G as default, and export
one sysfs to configure this as needed(such as 10G size with possible 1000
targets).

Then use one global radix tree to manage all the 2G physical pages(will grow
from 0 to 2G). Each ring buffer will search it's own data area bitmaps, and
if the current block is reusing, then we should get the old page, which has
already mapped to the runner process, from the global radix tree, or should
we get one new page(from global radix tree or system MM). After getting the
page, tcmu in kernel will use it my kmap(). Free it with kumapp() and 
insert to

global radix tree.


BRs
Xiubo








Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Xiubo Li



Write throughput is pretty
low at around 150 MB/s.

What's the original write throughput without this patch? Is it also
around 80 MB/s ?

It is around 20-30 MB/s. Same fio args except using --rw=write.

Got it.

Thanks.

BRs
Xiubo




Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Mike Christie
On 02/27/2017 07:22 PM, Xiubo Li wrote:
> Hi Mike
> 
> Thanks verrry much for your work and test cases.
> 
> 
> From: Xiubo Li 
>
> Currently for the TCMU, the ring buffer size is fixed to 64K cmd
> area + 1M data area, and this will be bottlenecks for high iops.
>>> Hi Xiubo, thanks for your work.
>>>
>>> daynmic -> dynamic
>>>
>>> Have you benchmarked this patch and determined what kind of iops
>>> improvement it allows? Do you see the data area reaching its
>>> fully-allocated size?
>>>
>> I tested this patch with Venky's tcmu-runner rbd aio patches, with one
>> 10 gig iscsi session, and for pretty basic fio direct io (64 -256K
>> read/writes with a queue depth of 64 numjobs between 1 and 4) tests read
>> throughput goes from about 80 to 500 MB/s.
> Looks nice.
> 
>> Write throughput is pretty
>> low at around 150 MB/s.
> What's the original write throughput without this patch? Is it also
> around 80 MB/s ?

It is around 20-30 MB/s. Same fio args except using --rw=write.


Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-28 Thread Xiubo Li

On 02/17/2017 01:24 AM, lixi...@cmss.chinamobile.com wrote:

From: Xiubo Li 

Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

Hi Xiubo, thanks for your work.

daynmic -> dynamic

Have you benchmarked this patch and determined what kind of iops
improvement it allows? Do you see the data area reaching its
fully-allocated size?


I tested this patch with Venky's tcmu-runner rbd aio patches, with one
10 gig iscsi session, and for pretty basic fio direct io (64 -256K
read/writes with a queue depth of 64 numjobs between 1 and 4) tests read
throughput goes from about 80 to 500 MB/s. Write throughput is pretty
low at around 150 MB/s.

I did not hit the fully allocated size. I did not drive a lot of IO though.


How about dealing with memories shrinking in patch series followed?

As the initial patch, we could set the cmd area size to 8MB and the
data area size to 512MB. And this could work fine for most cases
without using too much memories.

On my similar test case by using VMs(low iops case) using fio, -bs=[64K,
128K, 512K, 1M] -size=20G, -iodepth 1 -numjobs=10,  the bw of read
increases from about 5200KB/s to about 6100KB/s, and the bw of write
increases from about 3000KB/s to about 3300KB/s.

While bs < 64K(from the log, the maximum of the data length is 64K),
the smaller of it the two bws will be closer.

But for all my test cases, the allocated size is far away from the full size
too.

Thanks,

BRs
Xiubo






Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-27 Thread Xiubo Li

Hi Mike

Thanks verrry much for your work and test cases.



From: Xiubo Li 

Currently for the TCMU, the ring buffer size is fixed to 64K cmd
area + 1M data area, and this will be bottlenecks for high iops.

Hi Xiubo, thanks for your work.

daynmic -> dynamic

Have you benchmarked this patch and determined what kind of iops
improvement it allows? Do you see the data area reaching its
fully-allocated size?


I tested this patch with Venky's tcmu-runner rbd aio patches, with one
10 gig iscsi session, and for pretty basic fio direct io (64 -256K
read/writes with a queue depth of 64 numjobs between 1 and 4) tests read
throughput goes from about 80 to 500 MB/s.

Looks nice.


Write throughput is pretty
low at around 150 MB/s.

What's the original write throughput without this patch? Is it also
around 80 MB/s ?


I did not hit the fully allocated size. I did not drive a lot of IO though.

Yes, if the cmd area won't hit the fully allocated size, the data area is
also very hard to hit the fully allocated size.

And for the 64MB cmd area size is a little larger for 1GB data area.

Next, I will down it to smaller as needed.

Thanks,

BRs
Xiubo






Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-26 Thread Xiubo Li



Cool. This is a good approach for an initial patch but this raises
concerns about efficiently managing kernel memory usage -- the data area
grows but never shrinks, and total possible usage increases per
backstore. (What if there are 1000?) Any ideas how we could also improve
these aspects of the design? (Global TCMU data area usage limit?)

Two ways in my mind:

The first:
How about by setting a threshold cmd(SHRINK cmd), something likes
the PAD cmd, to tell the userspace runner try to shrink the memories?

Why should userspace need to know if the kernel is shrinking memory
allocated to the data area? Userspace knows about memory described in
iovecs for in-flight cmds, we shouldn't need its cooperation to free
other allocated parts of the data area.

Yes.


But, We likely don't want to release memory from the data area anyways
while active, in any case. How about if we set a timer when active
commands go to zero, and then reduce data area to some minimum if no new
cmds come in before timer expires?


If I understand correctly: for example, we have 1G(as the minimum)
data area and all blocks have been allocated and mapped to runner's
vma, then we extern it to 1G + 256M as needed. When there have no
active cmds and after the timer expires, will it reduce the data area
back to 1G ? And then should it release the reduced 256M data area's
memories ?

If so, after kfree()ed the blocks' memories, it should also try to remove
all the ptes which are mapping this page(like using the try_to_umap()),
but something like try_to_umap() doesn't export for the modules.

Without ummaping the kfree()ed pages' ptes mentioned above, then
the reduced 256M vma space couldn't be reused again for the runner
process, because the runner has already do the mapping for the reduced
vma space to some old physical pages(won't trigger new page fault
again). Then there will be a hole, and the hole will be bigger and bigger.

Without ummaping the kfree()ed pages' ptes mentioned above, the
pages' reference count (page_ref_dec(), which _inc()ed in page fault)
couldn't be reduced back too.


When the runner get the SHRINK cmd, it will try to remmap uio0's ring
buffer(?). Then the kernel will get chance to shrink the memories

The second:
Try to extern the data area by using /dev/uio1, we could remmap the
uio1 device when need, so it will be easy to get a chance to shrink the
memories in uio1.

Userspace should not need to remap the region in order for the kernel to
free and unmap pages from the region.

The only thing we need to watch out for is if blocks are referenced by
in-flight cmds, we can't free them or userspace will segfault.

Yes, agree.


So, if we
wait until there are no in-flight cmds, then it follows that the kernel
can free whatever it wants and userspace will not segfault.

So, the problem is how to ummap the kfree()ed pages' ptes.


BRs
Xiubo


-- Andy







Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-24 Thread Andy Grover
On 02/23/2017 06:07 PM, Xiubo Li wrote:
>> Cool. This is a good approach for an initial patch but this raises
>> concerns about efficiently managing kernel memory usage -- the data area
>> grows but never shrinks, and total possible usage increases per
>> backstore. (What if there are 1000?) Any ideas how we could also improve
>> these aspects of the design? (Global TCMU data area usage limit?)
> Two ways in my mind:
> 
> The first:
> How about by setting a threshold cmd(SHRINK cmd), something likes
> the PAD cmd, to tell the userspace runner try to shrink the memories?

Why should userspace need to know if the kernel is shrinking memory
allocated to the data area? Userspace knows about memory described in
iovecs for in-flight cmds, we shouldn't need its cooperation to free
other allocated parts of the data area.

But, We likely don't want to release memory from the data area anyways
while active, in any case. How about if we set a timer when active
commands go to zero, and then reduce data area to some minimum if no new
cmds come in before timer expires?

> When the runner get the SHRINK cmd, it will try to remmap uio0's ring
> buffer(?). Then the kernel will get chance to shrink the memories
> 
> The second:
> Try to extern the data area by using /dev/uio1, we could remmap the
> uio1 device when need, so it will be easy to get a chance to shrink the
> memories in uio1.

Userspace should not need to remap the region in order for the kernel to
free and unmap pages from the region.

The only thing we need to watch out for is if blocks are referenced by
in-flight cmds, we can't free them or userspace will segfault. So, if we
wait until there are no in-flight cmds, then it follows that the kernel
can free whatever it wants and userspace will not segfault.

-- Andy



Re: [PATCH] target/user: Add daynmic growing data area featuresupport

2017-02-23 Thread Xiubo Li



When N is bigger, the ratio will be smaller. If N >= 1, the ratio
will be [15/1024, 4/1024), for this the ratio 15 : 1024 will be
enough. But maybe some iscsi cmds has no datas, N == 0. So the ratio
should be bigger.

For now we will increase the data area size to 1G, and the cmd area
size to 128M. The tcmu-runner should mmap() about (128M + 1G) when
running and the TCMU will dynamically grows the data area from 0 to
max 1G size.

Cool. This is a good approach for an initial patch but this raises
concerns about efficiently managing kernel memory usage -- the data area
grows but never shrinks, and total possible usage increases per
backstore. (What if there are 1000?) Any ideas how we could also improve
these aspects of the design? (Global TCMU data area usage limit?)

Two ways in my mind:

The first:
How about by setting a threshold cmd(SHRINK cmd), something likes
the PAD cmd, to tell the userspace runner try to shrink the memories?

When the runner get the SHRINK cmd, it will try to remmap uio0's ring
buffer(?). Then the kernel will get chance to shrink the memories

The second:
Try to extern the data area by using /dev/uio1, we could remmap the
uio1 device when need, so it will be easy to get a chance to shrink the
memories in uio1.

Maybe these are a little ugly, are there other more effective ways ?

Thanks,

BRs
Xiubo



The cmd area memory will be allocated through vmalloc(), and the data
area's blocks will be allocated individually later when needed.