Re: [PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-07 Thread Tejun Heo
Hello,

On Thu, Jan 07, 2016 at 05:22:40AM +0530, Parav Pandit wrote:
> I can remove "resource" key word. If just that if something other than
> resource comes up to limit to in future, it will be hard to define at
> that time.

Please remove.  The word doesn't mean anything in this context.

> > Also can't the .max file list
> > the available resources?  Why does it need a separtae list file?
> >
> max file does lists them only after limits are configured for that
> device. Thats when rpool (array of max and usage counts) is allocated.
>
> If user wants to know what all knobs are available, than list file
> exposes them on per device basis without actually mentioning actual
> limit or without allocating rpool arrays.
...
> list file looks like below for two device entries.
> mlx4_0 ah qp mr pd srq flow
> ocrdma0 ah qp mr pd
> 
> max file looks like below.
> mlx4_0 ah=100 qp=40 mr=10 pd=90 srq=10 flow=10

Just always show the settings for all devices in the max file like the
following?

  mlx4_0 ah=max qp=max mr=max pd=max srq=max flow=max
  ocrdma0 ah=max qp=max mr=max pd=max

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-06 Thread Parav Pandit
On Thu, Jan 7, 2016 at 4:27 AM, Tejun Heo  wrote:
> Hello,
>
> On Thu, Jan 07, 2016 at 04:14:26AM +0530, Parav Pandit wrote:
>> Yes. I read through. I can see two changes to be made in V2 version of
>> this patch.
>> 1. rdma.resource.verb.usage and rdma.resource.verb.limit to change
>> respectively to,
>> 2. rdma.resource.verb.stat and rdma.resource.verb.max.
>> 3. rdma.resource.verb.failcnt indicate failure events, which I think
>> should go to events.
>
> What's up with the ".resource" part?

I can remove "resource" key word. If just that if something other than
resource comes up to limit to in future, it will be hard to define at
that time.

> Also can't the .max file list
> the available resources?  Why does it need a separtae list file?
>
max file does lists them only after limits are configured for that
device. Thats when rpool (array of max and usage counts) is allocated.

If user wants to know what all knobs are available, than list file
exposes them on per device basis without actually mentioning actual
limit or without allocating rpool arrays.

If you are hinting that I should allocate rpool array when rdma cgroup
is created, that can be done for already discovered devices.
If new devices are discovered after cgroup is created, for them we
anyway have to allocate/free when they appear/disappear.

In different implementation, where list of all the rdma cgroups can be
maintained, and rpool arrays can be allocated for all of them when new
device appear/disappear. This can move complexity of dynamic
allocation from try_charge/uncharge to device addition and removal
APIs. ib_register_ib_device() level.
However this comes with memory cost, where even if those device doesnt
participate in cgroup, for them rpool memory will be allocated for
each such rdma cgroup.

list file looks like below for two device entries.
mlx4_0 ah qp mr pd srq flow
ocrdma0 ah qp mr pd

max file looks like below.
mlx4_0 ah=100 qp=40 mr=10 pd=90 srq=10 flow=10


>> I roll out new patch for events post this patch as additional feature
>> and remove this feature in V2.
>>
>> rdma.resource.verb.list file is unique to rdma cgroup, so I believe
>> this is fine.
>
> Please see above.
>
>> We will conclude whether to have rdma.resource.hw. or not in
>> other patches.
>> I am in opinion to keep "resource" and "verb" or "hw" tags around to
>> keep it verbose enough to know what are we trying to control.
>
> What does that achieve?  I feel that it's getting overengineered
> constantly.

Please see above for "resource". I guess we are not loosing anything
by having "rdma.resource" vs just having "rdma".
But if that sounds too much, we can remove "resource".

>
> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-06 Thread Parav Pandit
On Wed, Jan 6, 2016 at 3:23 AM, Tejun Heo  wrote:
> Hello,
>
> On Wed, Jan 06, 2016 at 12:28:06AM +0530, Parav Pandit wrote:
>> +5-4-1. RDMA Interface Files
>> +
>> +  rdma.resource.verb.list
>> +  rdma.resource.verb.limit
>> +  rdma.resource.verb.usage
>> +  rdma.resource.verb.failcnt
>> +  rdma.resource.hw.list
>> +  rdma.resource.hw.limit
>> +  rdma.resource.hw.usage
>> +  rdma.resource.hw.failcnt
>
> Can you please read the rest of cgroup.txt and put the interface in
> line with the common conventions followed by other controllers?
>

Yes. I read through. I can see two changes to be made in V2 version of
this patch.
1. rdma.resource.verb.usage and rdma.resource.verb.limit to change
respectively to,
2. rdma.resource.verb.stat and rdma.resource.verb.max.
3. rdma.resource.verb.failcnt indicate failure events, which I think
should go to events.
I roll out new patch for events post this patch as additional feature
and remove this feature in V2.

rdma.resource.verb.list file is unique to rdma cgroup, so I believe
this is fine.

We will conclude whether to have rdma.resource.hw. or not in
other patches.
I am in opinion to keep "resource" and "verb" or "hw" tags around to
keep it verbose enough to know what are we trying to control.

Is that ok?

> Thanks.
>
> --
> tejun
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-05 Thread Parav Pandit
Added documentation for rdma controller to use in legacy mode and
using new unified hirerchy.

Signed-off-by: Parav Pandit 
---
 Documentation/cgroup-legacy/rdma.txt | 129 +++
 Documentation/cgroup.txt |  79 +
 2 files changed, 208 insertions(+)
 create mode 100644 Documentation/cgroup-legacy/rdma.txt

diff --git a/Documentation/cgroup-legacy/rdma.txt 
b/Documentation/cgroup-legacy/rdma.txt
new file mode 100644
index 000..70626c5
--- /dev/null
+++ b/Documentation/cgroup-legacy/rdma.txt
@@ -0,0 +1,129 @@
+   RDMA Resource Controller
+   
+
+Contents
+
+
+1. Overview
+  1-1. What is RDMA resource controller?
+  1-2. Why RDMA resource controller needed?
+  1-3. How is RDMA resource controller implemented?
+2. Usage Examples
+
+1. Overview
+
+1-1. What is RDMA resource controller?
+-
+
+RDMA resource controller allows user to limit RDMA/IB specific resources
+that a given set of processes can use. These processes are grouped using
+RDMA resource controller.
+
+RDMA resource controller currently allows two different type of resource
+pools.
+(a) RDMA IB specification level verb resources defined by IB stack
+(b) HCA vendor device specific resources
+
+RDMA resource controller controller allows maximum of upto 64 resources in
+a resource pool which is the internal construct of rdma cgroup explained
+at later part of this document.
+
+1-2. Why RDMA resource controller needed?
+
+
+Currently user space applications can easily take away all the rdma device
+specific resources such as AH, CQ, QP, MR etc. Due to which other applications
+in other cgroup or kernel space ULPs may not even get chance to allocate any
+rdma resources. This leads to service unavailability.
+
+Therefore RDMA resource controller is needed through which resource consumption
+of processes can be limited. Through this controller various different rdma
+resources described by IB uverbs layer and any HCA vendor driver can be
+accounted.
+
+1-3. How is RDMA resource controller implemented?
+
+
+rdma cgroup allows limit configuration of resources. These resources are not
+defined by the rdma controller. Instead they are defined by the IB stack
+and HCA device drivers(optionally).
+This provides great flexibility to allow IB stack to define new resources,
+without any changes to rdma cgroup.
+Rdma cgroup maintains resource accounting per cgroup, per device, per resource
+type using resource pool structure. Each such resource pool is limited up to
+64 resources in given resource pool by rdma cgroup, which can be extended
+later if required.
+
+This resource pool object is linked to the cgroup css. Typically there
+are 0 to 4 resource pool instances per cgroup, per device in most use cases.
+But nothing limits to have it more. At present hundreds of RDMA devices per
+single cgroup may not be handled optimally, however there is no known use case
+for such configuration either.
+
+Since RDMA resources can be allocated from any process and can be freed by any
+of the child processes which shares the address space, rdma resources are
+always owned by the creator cgroup css. This allows process migration from one
+to other cgroup without major complexity of transferring resource ownership;
+because such ownership is not really present due to shared nature of
+rdma resources. Linking resources around css also ensures that cgroups can be
+deleted after processes migrated. This allow progress migration as well with
+active resources, even though that’s not the primary use case.
+
+Finally mapping of the resource owner pid to cgroup is maintained using
+simple hash table to perform quick look-up during resource charing/uncharging
+time.
+
+Resource pool object is created in following situations.
+(a) User sets the limit and no previous resource pool exist for the device
+of interest for the cgroup.
+(b) No resource limits were configured, but IB/RDMA stack tries to
+charge the resource. So that it correctly uncharge them when applications are
+running without limits and later on when limits are enforced during uncharging,
+otherwise usage count will drop to negative. This is done using default
+resource pool. Instead of implementing any sort of time markers, default pool
+simplifies the design.
+
+Resource pool is destroyed if it was of default type (not created
+by administrative operation) and it’s the last resource getting
+deallocated. Resource pool created as administrative operation is not
+deleted, as it’s expected to be used in near future.
+
+If user setting tries to delete all the resource limit
+with active resources per device, RDMA cgroup just marks the pool as
+default pool with maximum limits for each resource, otherwise it deletes the
+default 

Re: [PATCHv1 6/6] rdmacg: Added documentation for rdma controller.

2016-01-05 Thread Tejun Heo
Hello,

On Wed, Jan 06, 2016 at 12:28:06AM +0530, Parav Pandit wrote:
> +5-4-1. RDMA Interface Files
> +
> +  rdma.resource.verb.list
> +  rdma.resource.verb.limit
> +  rdma.resource.verb.usage
> +  rdma.resource.verb.failcnt
> +  rdma.resource.hw.list
> +  rdma.resource.hw.limit
> +  rdma.resource.hw.usage
> +  rdma.resource.hw.failcnt

Can you please read the rest of cgroup.txt and put the interface in
line with the common conventions followed by other controllers?

Thanks.

-- 
tejun
--
To unsubscribe from this list: send the line "unsubscribe 
linux-security-module" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html