On 24.02.2014 15:30, Sagi Grimberg wrote:
When unmapping request data, it is unsafe automatically
decrement req-nfmr regardless of it's value. This may
happen since IO and reconnect flow may run concurrently
resulting in req-nfmr = -1 and falsely call ib_fmr_pool_unmap.
Something is still
' description in 'srp_rport'
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Masanari Iida standby2...@gmail.com
Cc: Sagi Grimberg sa...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: James Bottomley jbottom...@parallels.com
Cc: Roland Dreier rol
Hi Sagi,
is that /mswg/git/mlnx_ofed/mlnx-ofed-2.x-kernel.git tree from the
MLNX_OFED public by any chance?
There are fixes included relevant for the mainline. Would be strange if
I would send the patches as somebody at Mellanox discovered and fixed
the issues.
I've hit a kernel panic today
On 21.01.2014 11:03, Sagi Grimberg wrote:
On 1/20/2014 7:37 PM, Bart Van Assche wrote:
On 01/03/14 22:16, David Dillow wrote:
Today was my last day at ORNL, and my future endeavors will leave even
less time to maintain the SRP initiator.
My thanks especially go to Bart, for keeping the
Hi Hal,
we've encountered an issue with OpenSM 3.3.16 and the config option
console off.
OpenSM processes are at 100% CPU load.
From strace:
poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}])
read(0, , 4096) = 0
poll([{fd=0, events=POLLIN}], 1, 1000) =
On 09.10.2013 15:30, David Dillow wrote:
On Wed, 2013-10-09 at 09:28 -0400, Hal Rosenstock wrote:
From strace:
poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0, revents=POLLIN}])
read(0, , 4096) = 0
poll([{fd=0, events=POLLIN}], 1, 1000) = 1 ([{fd=0,
On 09.10.2013 16:00, Hal Rosenstock wrote:
Do you recall the sequence to get to this ?
Was console option changed to off and then OpenSM SIGHUP'd ? Something
else ?
Is this reproducible ?
Yes, now I can reproduce it. The opensm has been initially started with
console off and I activate
On 09.10.2013 17:15, Hal Rosenstock wrote:
What does service restart do in terms of OpenSM ?
Note that the console parameter is _not_ changeable on the fly right
now so if OpenSM is being SIGHUP'd by service restart then this is a
current limitation (and is clearly not detected/protected
instead of SUCCESS.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Reported-by: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: David Dillow dillo...@ornl.gov
Cc: Roland Dreier rol...@purestorage.com
Cc: Vu Pham v...@mellanox.com
---
drivers/infiniband/ulp/srp/ib_srp.c |3
On 28.06.2013 14:49, Bart Van Assche wrote:
If reconnecting failed we know that no command completion will
be received anymore. Hence let the SCSI error handler fail such
commands immediately.
Acked-by: Sebastian Riemer sebastian.rie...@profitbricks.com
--
To unsubscribe from this list: send
: David Dillow dillo...@ornl.gov
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Vu Pham v...@mellanox.com
---
drivers/infiniband/ulp/srp/ib_srp.c |2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
b/drivers/infiniband/ulp/srp/ib_srp.c
: David Dillow dillo...@ornl.gov
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Vu Pham v...@mellanox.com
---
drivers/infiniband/ulp/srp/ib_srp.c |2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
b/drivers/infiniband/ulp/srp/ib_srp.c
On 01.07.2013 13:33, Bart Van Assche wrote:
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
On 01.07.2013 13:38, Bart Van Assche wrote:
--- a/drivers/infiniband/ulp/srp/ib_srp.c
+++ b/drivers/infiniband/ulp/srp/ib_srp.c
@@ -1755,6 +1755,8 @@ static int srp_abort(struct scsi_cmnd *scmnd)
if (srp_send_tsk_mgmt(target, req-index, scmnd-device-lun,
On 28.06.2013 01:45, Roland Dreier wrote:
On Thu, Jun 27, 2013 at 2:01 PM, David Dillow dillo...@ornl.gov wrote:
On Wed, 2013-06-12 at 15:20 +0200, Bart Van Assche wrote:
If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources.
On 28.06.2013 14:48, Bart Van Assche wrote:
Avoid that srp_claim_command() can claim a command while
srp_queuecommand() is still busy queueing the same command.
Found this via source reading.
Nice, that's much less re-acquiring of the target lock in error case in
srp_queuecommand().
But if we
On 28.06.2013 16:51, Bart Van Assche wrote:
Nice, that's much less re-acquiring of the target lock in error case in
srp_queuecommand().
But if we have to change that many locations for srp_put_tx_iu() anyway,
wouldn't it make sense to rename it into __srp_put_tx_iu() as well?
Then we can
On 14.06.2013 19:07, Vu Pham wrote:
[...]
For what do you need the same target with multiple pkeys on the same
local SRP port?
There is no need, it's just a gray area that you can choose to have
multiple connections to same target using different pkeys (same as dgid)
Which other SRP
On 17.06.2013 09:29, Bart Van Assche wrote:
On 06/17/13 09:14, Hannes Reinecke wrote:
On 06/17/2013 09:04 AM, Bart Van Assche wrote:
I agree that the value of fast_io_fail_tmo should be kept small.
Although as you explained changing the SCSI device state into
SDEV_BLOCK doesn't help for I/O
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
On 14.06.2013 01:27, Vu Pham wrote:
Bart Van Assche wrote:
On 06/13/13 19:50, Vu Pham wrote:
Hello Bart,
+/** + * srp_conn_unique() - check whether the connection to
a target is unique + */ +static bool srp_conn_unique(struct
srp_host *host, +
error handler skips the srp_reset_host() call after a transport
layer error.
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
drivers/infiniband/ulp/srp/ib_srp.c |1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/infiniband/ulp/srp
Cc: Roland Dreier rol...@kernel.org
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
drivers/infiniband/ulp/srp/ib_srp.c | 38
+++
1 file changed, 38 insertions(+)
diff --git
Bart's version also has the printing of the connection string if the
double login fails.
So forget about this version here.
On 12.06.2013 13:51, Sebastian Riemer wrote:
Hi all,
as proposed by Or, let's discuss this on the mailing list.
This is a fundamental change required for everything
On 13.06.2013 17:07, Bart Van Assche wrote:
[...]
The %.*s should only copy the data provided by the user, even if it
is not '\0' terminated. Stripping the trailing newline is probably
possible with something like the (untested) code below (will only work
if there is only one newline in the
the srp-tools.
Please compare with Bart's version and let's discuss this here.
https://github.com/bvanassche/ib_srp-backport/commit/7d8774ff58d489858b1c046b2bf01b4e84e8dd9b
Cheers,
Sebastian
On 12.06.2013 13:29, Sebastian Riemer wrote:
The sysfs attribute 'add_target' may not be used for multiple
...@dev.mellanox.co.il
Reviewed-by: Eli Cohen e...@mellanox.co.il
Signed-off-by: Bart Van Assche bvanass...@acm.org
Cc: Roland Dreier rol...@purestorage.com
Cc: David Dillow dillo...@ornl.gov
Cc: Vu Pham v...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
drivers
...@mellanox.com
Cc: Sebastian Riemer sebastian.rie...@profitbricks.com
---
drivers/infiniband/ulp/srp/ib_srp.c |4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/infiniband/ulp/srp/ib_srp.c
b/drivers/infiniband/ulp/srp/ib_srp.c
index 368d160..9c638dd 100644
--- a/drivers
On 08.06.2013 04:31, Bruce McKenzie wrote:
Hi Bart.
any advice on using this fix with MD raid 1? a guide or site you know of?
ive compiled ubuntu 13.04 to kernel 3.6.11 with OFED 2 from Mellanox, and it
works ok, performance is a little better with SRP. Some packages dont seem
to work,
On 10.06.2013 14:44, Bart Van Assche wrote:
On 06/10/13 14:05, Sebastian Riemer wrote:
Perhaps, I should collect all guys who require MD RAID-1 for remote
storage replication in order to put some pressure on Neil.
If I remember correctly one of the things Neil is trying to explain to
md
On 17.05.2013 16:16, Jack Wang wrote:
unable to handle kernel paging request
Hi Jack,
this should be related to the list corruption in IPoIB as list_del()
sets the LIST_POISON1 and LIST_POISON2 pointers.
Referencing these results in page faults according to the documentation
in the code.
On 15.05.2013 07:12, Vasiliy Tolstov wrote:
2013/5/14 Bart Van Assche bvanass...@acm.org:
The ability to close a session from the initiator side went upstream in
kernel 3.8 (/sys/class/srp_remote_ports/port-h:n/delete). Regarding
faster reconnects: please keep in mind that after a cable pull
On 14.05.2013 12:02, Vasiliy Tolstov wrote:
Sorry for bumping old thread, i'm solve my problems with new firmware.
I have supermicro servers that rebrand mellanox firmware (recompile
and change some bits)
Now all works fine i have 40 gb/s QDR instead of 10 Gb/s
Thanks, sharing lesson learned
reconnects and ability to close session from
initiator side under qlogic hardware, does it possible? Or this
patches only covers mallanox cards?
2013/5/8 Sebastian Riemer sebastian.rie...@profitbricks.com:
FYI: I've released version 0.6 of my SRP patches today.
The automatic reconnect is included now
Hi Gandalf,
just build up two separate fabrics. This means that you don't
interconnect both switches.
Otherwise, issues on one port also affect the other port.
What do you use for storage? SRP?
This requires dm-multipath and fast IO failing + automatic reconnect
patches from Bart or from me.
FYI: I've released version 0.6 of my SRP patches today.
The automatic reconnect is included now. The tests for that will follow
in the next version. But we already did quite intensive testing for that.
Hard reboot and also soft reboot of the target are possible with that
reconnect. It just
a technical talk there about SRP:
http://www.linuxtag.org/2013/en/program/thursday-may-23-2013.html?eventid=208
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer - Storage
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.rie
On 09.04.2013 13:51, Vasiliy Tolstov wrote:
Something like this:
echo 4096 /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu
After doing this all srp connections down and port is down. I need to
restart openibd
Sorry for that! It's much easier to set the IP MTU. Managed switches
support
On 09.04.2013 14:49, Hal Rosenstock wrote:
On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
Hello. I have some servers, with mellanox ConnectX-3 and have some questions:
Why max_mtu differs with active_mtu?
What does peer port say for max MTU ?
How can i set active mtu?
SM sets active MTU
On 09.04.2013 15:34, Hal Rosenstock wrote:
On 4/9/2013 9:16 AM, Sebastian Riemer wrote:
On 09.04.2013 14:49, Hal Rosenstock wrote:
On 4/9/2013 7:12 AM, Vasiliy Tolstov wrote:
Hello. I have some servers, with mellanox ConnectX-3 and have some
questions:
Why max_mtu differs with active_mtu
On 09.04.2013 16:23, Hal Rosenstock wrote:
So these values are exactly the same as in ibv_devinfo and can be set
in /sys/class/infiniband/mlx4_0/device/mlx4_port1_mtu.
I've found the PortInfo with the command
smpquery portinfo -C mlx4_0 3 1
where I'm using the first HCA to contact the SM. I
,
Sebastian
Btw.: Before, I've hacked MD RAID-1 for high-performance replication as
DRBD is crap for our purposes. But that's worthless without a reliably
working transport.
From c101d00fe529d845192dd6d5930a1b9c16c99b81 Mon Sep 17 00:00:00 2001
From: Sebastian Riemer sebastian.rie...@profitbricks.com
On 19.03.2013 12:22, Or Gerlitz wrote:
On 19/03/2013 12:16, Sebastian Riemer wrote:
Hi Bart,
now I've got my priority on SRP again.
Hi Sebastian,
Are these patches targeted to upstream or backports to some OS/kernel?
if the former, can you please
send them inline so we can have proper
On 19.03.2013 12:45, Bart Van Assche wrote:
On 03/19/13 11:16, Sebastian Riemer wrote:
What are your thought regarding this?
Attached patches:
ib_srp: register srp_fail_rport_io as terminate_rport_io
ib_srp: be quiet when failing SCSI commands
scsi_transport_srp: disable
On 26.02.2013 17:55, Roland Dreier wrote:
[...]
In fact I bet this is why the bug has been there as long as it has
been: almost no one is using IPv6 on IPoIB seriously, and IPv4 should
work OK as you point out.
Thanks a lot, Unfortunately, we are using IPoIB with IPv6 in
production for the
On 08.02.2013 10:24, Sagi Grimberg wrote:
On 2/8/2013 12:42 AM, Vu Pham wrote:
Hello Bart,
Thank you for taking the initiative.
Mellanox think that this should be discussed. We'd be happy to attend.
We also would like to discuss:
* How and how fast does SRP detect a path failure besides RC
On 06.02.2013 10:22, Or Gerlitz wrote:
On 06/02/2013 11:17, Mathis GAVILLON wrote:
Ok. But what is it possible to do with Infiniband VFs if QP0 is not
available ?
EVERYTHING, e.g run IPoIB, iSER, RDS, MPI, etc, etc - except for what
requires QP0, such as running SM or issuing SMPs for
On 06.02.2013 11:20, Or Gerlitz wrote:
On 06/02/2013 12:04, Mathis GAVILLON wrote:
Just a last question : is that possible VFs lid to be different from
PF one ?
NO, we've implemented a shared port model, so all functions on the
same IB port use the same lid, each function has its own
Hi Bart,
thanks for approaching this! We're not the best mainline developers so I
guess we won't be there. But we have the big SRP setups and our
sysadmins really don't like reconnecting SRP hosts manually and putting
their devices complicated to the related dm-multipath devices again.
Think
Hi Vladimir,
why do you put OFED together for a kernel nobody uses? Perhaps SLES and
Red Hat do it like this but nobody else.
Have a look at http://en.wikipedia.org/wiki/Linux_kernel - 3.0, 3.2 and
3.4 are the long-term stable releases.
This approach is worse than the approach before IMHO.
Hi Bart,
we've triggered the WARN_ON() in srp_wait_last_send_wqe() by connecting
to a disabled SCST SRP target.
I would remove that one.
Cheers,
Sebastian
On 09.08.2012 17:53, Bart Van Assche wrote:
Modify srp_disconnect_target() such that it waits until it is
sure that no new IB
. ;-)
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.rie...@profitbricks.com
Tel.: +49 - 30 - 60 98 56 991 - 915
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht
On 31.07.2012 13:08, Alex Netes wrote:
Congestion control isn't a credit based mechanism. While InfiniBand flow
control is defined between two ports of the same link, congestion control is
working across the fabric between a congestion point (a switch) and a reaction
point (source node).
On 19.07.2012 22:31, Roland Dreier wrote:
I have to think about the best way to fix this. We could just
convert to vmalloc() here but I'm not thrilled about consuming
vmalloc() space (on modern 64-bit architectures it's a non-issue
but it's going to cause issues for people on smaller
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH • Greifswalder Str. 207 • 10405 Berlin, Germany
www.profitbricks.com • sebastian.rie...@profitbricks.com
Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Andreas
Hi Chet,
On 22/06/12 21:02, Chet Murthy wrote:
Sebastian,
Thank you for taking the time to explain these things! It's a little
confusing
Here a simple list of matching code:
OFED-1.5.4 --- kernel 3.2.x
OFED-1.5.4.1 --- kernel 3.3.x
(1) Is there a more-exhaustive list of the
Hi Chet,
the trick is to check out the latest pkg-ofed source from debian SVN
(svn://svn.debian.org/svn/pkg-ofed/) and to update the upstream source
by merging the stuff by extracting the source RPMs or even better by
importing the source directly from the git repos of the OFED user space.
In the
On 17/01/12 15:56, Or Gerlitz wrote:
could you try and patch your 3.0.15 kernel with commit
52439540ea30396982b69662dd21aede6b336288 IB/iser: DMA unmap TX bufs
used for iSCSI/iSER headers from upstream, this could help here.
Hi Or,
unfortunately, just cherry-picking that commit didn't do the
On 19/01/12 13:18, Or Gerlitz wrote:
[...]
Or Gerlitz (4):
IB/iser: Fix wrong mask when sizeof (dma_addr_t) sizeof
(unsigned long)
IB/iser: Support iSCSI PDU padding
IB/iser: Use separate buffers for the login request/response
IB/iser: DMA unmap TX bufs used for
On 16/01/12 22:16, Or Gerlitz wrote:
Sebastian, I asked for the **iser** (ib_iser) and not mlx4_core debug_level=2
Yes, I did! I've enabled that additionally. And I've checked these
settings in /sys/module/*/parameters. They were set. The libiscsi from
OFED had only the option debug_libiscsi
On 12/01/12 17:14, Or Gerlitz wrote:
you didn't send the kernel logs from the failure after opening the iser
(debug_level=2) and libiscsi (debug_libiscsi_session=1
debug_libiscsi_conn=1) debug prints
OK, I've also set mlx4_core debug_level=2 and have verified in
/sys/module that the
On 12/01/12 10:29, Or Gerlitz wrote:
If you have build the kernel IB user space support (uverbs) and the
IB libs, do ibv_devinfo if not, just ossi cat
/sys/class/infiniband/mlx4_0/* and send the output. To be clear, iser
does work for you on the productive servers but not on this server?
On 12/01/12 11:16, Sebastian Riemer wrote:
On 12/01/12 10:29, Or Gerlitz wrote:
If you have build the kernel IB user space support (uverbs) and the
IB libs, do ibv_devinfo if not, just ossi cat
/sys/class/infiniband/mlx4_0/* and send the output. To be clear, iser
does work for you
88402391f898 status 4 vend_err 57
Or, could you please investigate/explain?
It is a pain that we need both: working iSER and IPoIB traffic with good
performance.
Cheers,
Sebastian
On 19/12/11 10:14, Sebastian Riemer wrote:
Hi list,
I've already sent this to the open-iscsi mailing list
you wrote long emails, I'm asking for one concrete example for that enum
crunching of adding entries
not at the end, can you, please?
I've meant e.g. the iscsi tasks in libiscsi.h between 2.6.30 and
2.6.32. But I've meant this for OFED and not the mainline kernel.
2.6.30:
enum {
2011/12/21 Or Gerlitz ogerl...@mellanox.com:
I tested the upstream kernel iser against the upstream iscsi tools from
git://github.com/mikechristie/open-iscsi
(commit 4323e342d2c9fb8ed7233ce855001c189ec55b23), it works
To bring this to an end: I believe you. Most likely I had that much
2011/12/20 Or Gerlitz ogerl...@mellanox.com:
Beep, I'd like to better/understand the problem before looking on your
struggle for solution...
I understand that your Debian system runs kernel 3.0 - however, you didn't
say what version of the iscsi initiator utils is provided with that distro
2011/12/20 Or Gerlitz ogerl...@mellanox.com:
Beep(2), so your system has distro which is based on kernel 2.6.32 and iscsi
initiator tools version 2.0.871 and per your needs, you've booted it with
kernel 3.0 .
At this point should you have stop and make sure that this combo works,
iscsi wise
Would it help, if we provide our patches for open-iscsi and IB/iSER
2.6.32 to bring that into mainline OFED?
As Or notes, OFED is providing the kernel modules more than the iscsi code
drop. Would be better for all (cough cough) to push changes back to the
iscsi initiator maintainer (Mike
2011/12/20 Or Gerlitz or.gerl...@gmail.com:
horses, please, stay at home, or at least run a little bit slower,
just for you - from 2 minutes
ago - iser works well with 3.2.0-rc5 (its say -dirty b/c its a
development system and the kernel has some patches, but not iser ones)
and
be found.
After fixing that, it worked for me.
Cheers,
Sebastian
--
Sebastian Riemer
Linux Kernel Developer
ProfitBricks GmbH
Greifswalder Str. 207
10405 Berlin, Germany
Tel.: +49 - 30 - 51 64 09 20
Fax: +49 - 30 - 51 64 09 22
Email: sebastian.rie...@profitbricks.com
Web: http
71 matches
Mail list logo