On Mar 25, 2007, at 4:22 AM, Dotan Barak wrote:

Hi Troy.

I can only answer about your info which is related to the mthca devices.


Troy Benjegerdes wrote:
We have been getting some interesting failures with ibv_reg_mr..

gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 1048576
device # 0 name="mthca0" guid="00066a0098000464"
        ibv_open_device() context=0x10012c98
        ibv_alloc_pd() pd=0x10013678
        alloc: 2482
        ibv_reg_mr failed:: Cannot allocate memory
        fw_ver: 3.3.2
        max_mr_size 0xffffffffffffffff
        max_mr: 131056, could only register 2482 regions
        sleep 5 sec
        free: 0
done
I wasn't able to reproduce this failure but i noticed that you are using an old FW version (current version is 3.5.0).

with a 10MB buffer:

gcc -ggdb -libverbs -o mr-test mr-test.c
/usr/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="00066a0098000464"
        ibv_open_device() context=0x10012c98
        ibv_alloc_pd() pd=0x10013678
        alloc: 2482
        ibv_reg_mr failed:: Cannot allocate memory
        fw_ver: 3.3.2
        max_mr_size 0xffffffffffffffff
        max_mr: 131056, could only register 2482 regions
        sleep 5 sec
        free: 0
done
On 64 bit machine i got a kernel oops, bug number 490 was opened in the Bugzilla and we are analyzing this failure.
And, on an PCI-express mellanox hca:
/afs/scl.ameslab.gov/user/troy/src/ibv-mr-test/mr-test
mr-test: bufsize 10485760
device # 0 name="mthca0" guid="0002c9020040272c"
        ibv_open_device() context=0x504c00
        ibv_alloc_pd() pd=0x503f30
        alloc: 12277
        ibv_reg_mr failed:: Cannot allocate memory
        fw_ver: 5.1.0
        max_mr_size 0xffffffffffffffff
        max_mr: 131056, could only register 12277 regions
        sleep 5 sec
        free: 0
done
I'm checking this issue and let you know about what i will find.

On the pci-express hca, it also looks like the memory usage, as reported by "free" goes down by about 300MB once all these regions are allocated.. but the process usage as reported by top is only 20mb total virtual size. What's going on here?
are you talking about the "free memory" which is reported by top?

Both the free memory reported by 'top', and the free memory reported by the 'free' command on debian.



_______________________________________________
general mailing list
[email protected]
http://lists.openfabrics.org/cgi-bin/mailman/listinfo/general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to