[ANNOUNCE]: SCST 3.3 pre-release freeze

2017-08-31 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce SCST 3.3 pre-release code freeze in the SCST SVN branch 
3.3.x.

You can get it by command:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst/branches/3.3.x

It is going to be released after few weeks of testing, if no significant issues 
found.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated storage devices, which can provide advanced functionality, like 
replication, thin provisioning, deduplication, high availability, automatic 
backup, etc. Many recently developed SAN appliances, especially higher end 
ones, are SCST based. It might well be that your favorite storage appliance 
running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to Bart Van Assche and 
SanDisk/Western Digital!

Vlad



[ANNOUNCE]: SCST 3.2 released

2016-12-15 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce SCST 3.2 has just been released

You can download it from http://scst.sourceforge.net/downloads.html

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated
storage devices, which can provide advanced functionality, like replication, 
thin
provisioning, deduplication, high availability, automatic backup, etc. Many of 
modern
SAN appliances, especially higher end ones, are SCST based. It might well be 
that your
favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to SanDisk/WDC for the great 
support!

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 3.2 pre-release freeze

2016-08-02 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce SCST 3.2 pre-release code freeze in the SCST SVN branch 
3.2.x.

You can get it by command:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst/branches/3.2.x

It is going to be released after few weeks of testing, if no significant issues 
found.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated storage devices, which can provide advanced functionality, like 
replication, thin provisioning, deduplication, high availability, automatic 
backup, etc. Majority of recently developed SAN appliances, especially higher 
end ones, are SCST based. It might well be that your favorite storage appliance 
running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to SanDisk/WDC for the great 
support!

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] LIO/SCST Merger

2016-01-28 Thread Vladislav Bolkhovitin
Nicholas A. Bellinger wrote on 01/27/2016 10:36 PM:
> On Wed, 2016-01-27 at 09:54 -0800, Bart Van Assche wrote:
>> Last year, during the 2015 LSF/MM summit, it has been decided that the 
>> LIO/SCST merger project should proceed by sending the functionality 
>> upstream that is present in SCST but not yet in LIO. This will help to 
>> reduce the workload of target driver maintainers that maintain a version 
>> of their target driver for both LIO and SCST (QLogic FC and FCoE target 
>> drivers, Emulex FC and FCoE target drivers, RDMA iSER target driver, 
>> RDMA SRP target driver, ...). My proposal is to organize a session 
>> during which the following is discussed:
>> * Which patches are already upstream in the context of the LIO/SCST 
>> merger project.
>> * About which patches there is agreement but that are not yet upstream.
>> * To discuss how to proceed from here and what to address first.
> 
> No, just no.  If you've not been able to articulate the specifics of
> what you're talking about to the list by now, it's never going to
> happen.
> 
> You'll recall last year how things unfolded at LSF.  You started
> comparing data structure TMR member names of no consequence to a larger
> LSF audience, and quickly tried to pivot into a discussion about adding
> hooks to LIO fabric drivers for your own out-of-tree nastiness.
> 
> I really fail to see how that helps LIO or upstream.  To repeat.  I'll
> not allow SCST's out-of-tree legacy requirements to limit LIO's future
> in upstream, and if you or your employer is still trying to get
> enterprise distros to listen to that nonsense behind the scenes, then
> please stop wasting everybody's time.
> 
> Bart, I really want to believe you and your employer have good
> intentions for LIO.  However, being one of it's largest detractors in
> the past means that you have to really put your best foot forward on
> your interaction with the LIO community.
> 
> However, your inability to ask questions before acting, refusing to
> answer to all feedback on reviews for changes of substance, and not
> following the expected patch review progress without repeatably leading
> yourself and others down the wrong path really makes me start to
> question your intentions, or at least your abilities as a kernel
> contributor.
> 
> Also, you've not managed to merge any of the outstanding ib_srpt fixes
> from the last year, which brings us to a grad total of 6 small patches
> since the original merge of ib_srpt in Oct 2011.
> 
> # git log --author=Bart --oneline -- drivers/infiniband/ulp/srpt/
> 19f5729 IB/srpt: Fix the RDMA completion handlers
> ba92999 target: Minimize SCSI header #include directives
> 2fe6e72 ib_srpt: Remove set-but-not-used variables
> 649ee05 target: Move task tag into struct se_cmd + support 64-bit tags
> afc1660 target: Remove first argument of target_{get,put}_sess_cmd()
> ab477c1 srp-target: Retry when QP creation fails with ENOMEM
> 
> That's really a terrible record.
> 
> So until you're able to demonstrate publicly to me and the LIO community
> that you do have good intentions, and not trying to rehash the same
> tired old nonsense and willful ignorance, please stop throwing out these
> generic topics as a branding exercise.
> 
> There are much more interesting and important topics at LSF to discuss.

While I'm generally refraining from feeding trolls, don't you think that a 
person who
has contributed you one of the major drivers and continues making such important
contributions (for free!) trying to bring (eventually, after how many years?) 
LIO
reliability to something you can compare to SCST, deserves a little more 
respect?

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 3.1 release

2016-01-21 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce that SCST version 3.1 has just been released and available 
for
download from http://scst.sourceforge.net/downloads.html.

Highlights for this release:

 - Cluster support for SCSI reservations. This feature is essential for 
initiator-side
clustering approaches based on persistent reservations, e.g. the quorum disk
implementation in Windows Clustering.

 - Full support for VAAI or vStorage API for Array Integration: Extended Copy 
command
support has been added as well as performance of WRITE SAME and of Atomic Test 
& Set,
also known as COMPARE AND WRITE, has been improved.

 - T10-PI support has been added.

 - ALUA support has been improved: explicit ALUA (SET TARGET PORT GROUPS 
command) has
been added and DRBD compatibility has been improved.

 - SCST events user space infrastructure has been added, so now SCST can notify 
a user
space agent about important internal and fabric events.

 - QLogic target driver has been significantly improved.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated
storage devices, which can provide advanced functionality, like replication, 
thin
provisioning, deduplication, high availability, automatic backup, etc. Majority 
of
recently developed SAN appliances, especially higher end ones, are SCST based. 
It might
well be that your favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to SanDisk for the great support! 
All
above highlights development was supported by SanDisk.

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE]: SCST 3.1 pre-release freeze

2015-11-06 Thread Vladislav Bolkhovitin
Hi,

Bike & Snow wrote on 11/06/2015 10:55 AM:
> Hello Vlad
> 
> Excellent news on all the updates.
> 
> Regarding this:
> - QLogic target driver has been significantly improved.
> 
> Does that mean I should stop building the QLogic target driver from here?
> git://git.qlogic.com/scst-qla2xxx.git 
> 
> Or are you saying the git.qlogic.com  has been 
> improved?

It is saying that qla2x00t was improved.

The ultimate goal is to have the mainstream (git) QLogic target driver to be 
the main
and the only QLogic target driver, but, unfortunately, this driver not yet 
reached
level of quality and maturity of qla2x00t. We with QLogic are working toward it.

> If I stop building the one from git.qlogic.com , does 
> the 3.2.0
> one support NPIV?

Yes, it has full NPIV support.

Vlad

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 3.1 pre-release freeze

2015-11-05 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce SCST 3.1 pre-release code freeze in the SCST SVN branch 
3.0.x.

You can get it by command:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst/branches/3.1.x

It is going to be released after few weeks of testing, if no significant issues 
found.

Highlights for this release:

 - Cluster support for SCSI reservations. This feature is essential for 
initiator-side
clustering approaches based on persistent reservations, e.g. the quorum disk
implementation in Windows Clustering.

 - Full support for VAAI or vStorage API for Array Integration: Extended Copy 
command
support has been added as well as performance of WRITE SAME and of Atomic Test 
& Set,
also known as COMPARE AND WRITE, has been improved.

 - T10-PI support has been added.

 - ALUA support has been improved: explicit ALUA (SET TARGET PORT GROUPS 
command) has
been added and DRBD compatibility has been improved.

 - SCST events user space infrastructure has been added, so now SCST can notify 
a user
space agent about important internal and fabric events.

 - QLogic target driver has been significantly improved.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated
storage devices, which can provide advanced functionality, like replication, 
thin
provisioning, deduplication, high availability, automatic backup, etc. Majority 
of
recently developed SAN appliances, especially higher end ones, are SCST based. 
It might
well be that your favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially to SanDisk for the great support! 
All
above highlights development was supported by SanDisk.

Vlad
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 3.0.1 released

2015-02-24 Thread Vladislav Bolkhovitin
I'm glad to announce that maintenance update for SCST and its drivers 3.0.1 has 
just
been released and ready for download from 
http://scst.sourceforge.net/downloads.html.
All SCST users are encouraged to update.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated
storage devices, which provide advanced functionality, like replication, thin
provisioning, deduplication, high availability, automatic backup, etc. Majority 
of
recently developed SAN appliances, especially higher end ones, are SCST based. 
It might
well be that your favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen, especially Bart Van Assche!

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] iSCSI MQ adoption via MCS discussion

2015-01-13 Thread Vladislav Bolkhovitin
Sagi Grimberg wrote on 01/08/2015 05:45 AM:
 RFC 3720 namely requires that iSCSI numbering is
 session-wide. This means maintaining a single counter for all MC/S
 sessions. Such a counter would be a contention point. I'm afraid that
 because of that counter performance on a multi-socket initiator system
 with a scsi-mq implementation based on MC/S could be worse than with the
 approach with multiple iSER targets. Hence my preference for an approach
 based on multiple independent iSER connections instead of MC/S.
 
 So this comment is spot on the pros/cons of the discussion (we might want to 
 leave
 something for LSF ;)).
 MCS would not allow a completely lockless data-path due to command
 ordering. On the other hand implementing some kind of multiple sessions
 solution feels somewhat like a mis-fit (at least in my view).
 
 One of my thoughts about how to overcome the contention on commands
 sequence numbering was to suggest some kind of negotiable relaxed
 ordering mode but of course I don't have anything figured out yet.

Linux SCSI/block stack neither uses, nor guarantees any commands order. 
Applications
requiring commands order enforce it by queue draining (i.e. wait until all 
previous
commands finished). Hence, MC/S enforced commands order is an overkill, which
additionally coming with some non-zero performance cost.

Don't do MC/S, do independent connections. You know the KISS principle. Memory 
overhead
to setup the extra iSCSI sessions should be negligible.

Vlad

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: T10-PI: Getting failed tag info

2014-12-12 Thread Vladislav Bolkhovitin
Martin K. Petersen wrote on 12/11/2014 07:12 PM:
 Vlad == Vladislav Bolkhovitin v...@vlnb.net writes:
 Vlad We are currently developing a SCSI target system with T10-PI. We
 Vlad are using block integrity interface and found a problem that this
 Vlad interface fundamentally can not pass Oracle T10-PI certification
 Vlad tests. Those tests require to receive on the initiator side
 Vlad information about which particular tag failed the target checks,
 Vlad but the block integrity interface does not preserve this
 Vlad information, hence the target can not deliver it to the initiator
 Vlad = certification failure. The storage provides the right sense,
 Vlad but then in scsi_io_completion() it is dropped and replaced by a
 Vlad single EILSEQ.
 
 Vlad What would be the best way to fix that? By making a patch
 Vlad introducing new -EXX error codes for the PI errors?
 
 I posted such a patch a while back. We use that in our qualification
 tooling to ensure that the right things are reported when a PI error is
 injected at various places in the stack.

Thanks, this is exactly what is needed.

Reviewed-by: Vladislav Bolkhovitin v...@vlnb.net

 One thing that needs to be done is to make returning these new errors to
 userland conditional on !BIP_BLOCK_INTEGRITY. I'll put that on my list.

Ever without it this patch is quite valuable.

Vlad


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


T10-PI: Getting failed tag info

2014-12-10 Thread Vladislav Bolkhovitin
Hi,

We are currently developing a SCSI target system with T10-PI. We are using 
block integrity interface and found a problem that this interface fundamentally 
can not pass Oracle T10-PI certification tests. Those tests require to receive 
on the initiator side information about which particular tag failed the target 
checks, but the block integrity interface does not preserve this information, 
hence the target can not deliver it to the initiator = certification failure. 
The storage provides the right sense, but then in scsi_io_completion() it is 
dropped and replaced by a single EILSEQ.

What would be the best way to fix that? By making a patch introducing new 
-EXX error codes for the PI errors?

Thanks,
Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] New qla2x00tgt Driver Question

2014-12-04 Thread Vladislav Bolkhovitin
Dr. Greg Wettstein wrote on 12/03/2014 11:42 PM:
 On Dec 3,  8:59pm, Vladislav Bolkhovitin wrote:
 } Subject: Re: [Scst-devel] New qla2x00tgt Driver Question
 
 Dr. Greg Wettstein wrote on 12/03/2014 12:46 PM:
 Secondly, Vlad, we have been running additional testing for the last
 two days and we have logs from the SCST core which I am including
 below which suggests that the SCST core target code excessively stalls
 or mishandles an ABORT while processing a NEXUS_LOSS_SESS TMF.
 Regardless of your feelings about the target driver code in the kernel
 we need to make sure there is not some subtle regression in the core
 SCST code paths during TMF processing.

 I don't see any problem on the SCST core level in the logs.
 
 Fair enough, thanks for taking a look.
 
 I though it was somewhat strange to see deferred ABORT's on I/O being
 done to RAM based block devices as there is little or no I/O latency.
 In our testing, this regression always occurs on TMF function 6
 processing and this was also the case in Marc's report. The comment
 that one of the other posters made that this was secondary to slow
 backstorage didn't match the characteristics of our test environment.

For TM processing backend and frontend (target) sides are equal, so the longest 
side
processing is what that defines your TM processing time.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] New qla2x00tgt Driver Question

2014-12-03 Thread Vladislav Bolkhovitin
Dr. Greg Wettstein wrote on 12/03/2014 12:46 PM:
 Secondly, Vlad, we have been running additional testing for the last
 two days and we have logs from the SCST core which I am including
 below which suggests that the SCST core target code excessively stalls
 or mishandles an ABORT while processing a NEXUS_LOSS_SESS TMF.
 Regardless of your feelings about the target driver code in the kernel
 we need to make sure there is not some subtle regression in the core
 SCST code paths during TMF processing.

I don't see any problem on the SCST core level in the logs.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE]: SCST 3.0 released

2014-09-21 Thread Vladislav Bolkhovitin
No, because it's too new, but you can always get it from the git. Or you 
can use stable Emulex driver for 16Gb connectivity. It's not in the 
bundle only because of the Emulex policy.


Thanks,
Vlad

On 9/19/2014 23:59, scst.n...@gmail.com wrote:

Does 16Gb qla2x00t included?

发自我的小米手机

Vladislav Bolkhovitin v...@vlnb.net于 2014-9-20 下午2:39写道:

Hi All,

I'm glad to announce that SCST 3.0 has just been released. This
release includes SCST
core, target drivers iSCSI-SCST for iSCSI, including iSER support
(thanks to
Mellanox!), qla2x00t for QLogic Fibre Channel adapters, ib_srpt for
InfiniBand SRP,
fcst for FCoE and scst_local for local loopback-like access as well
as SCST management
utility scstadmin. Also separately you can download from Emulex
development portal
stable and fully functional target driver for the current generation
of Emulex Fibre
Channel adapters.

SCST is alternative SCSI target stack for Linux. SCST allows
creation of sophisticated
storage devices, which provide advanced functionality, like
replication, thin
provisioning, deduplication, high availability, automatic backup,
etc. Majority of
recently developed SAN appliances, especially higher end ones, are
SCST based. It might
well be that your favorite storage appliance running SCST in the
firmware.

More info about SCST and its modules you can find on:
http://scst.sourceforge.net

Thanks to all who made it happen!

Vlad

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 3.0 released

2014-09-20 Thread Vladislav Bolkhovitin

Hi All,

I'm glad to announce that SCST 3.0 has just been released. This release includes SCST 
core, target drivers iSCSI-SCST for iSCSI, including iSER support (thanks to 
Mellanox!), qla2x00t for QLogic Fibre Channel adapters, ib_srpt for InfiniBand SRP, 
fcst for FCoE and scst_local for local loopback-like access as well as SCST management 
utility scstadmin. Also separately you can download from Emulex development portal 
stable and fully functional target driver for the current generation of Emulex Fibre 
Channel adapters.


SCST is alternative SCSI target stack for Linux. SCST allows creation of sophisticated 
storage devices, which provide advanced functionality, like replication, thin 
provisioning, deduplication, high availability, automatic backup, etc. Majority of 
recently developed SAN appliances, especially higher end ones, are SCST based. It might 
well be that your favorite storage appliance running SCST in the firmware.


More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen!

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-06-24 Thread Vladislav Bolkhovitin

Martin K. Petersen, on 06/23/2014 06:58 PM wrote:

Mike == Mike Christie micha...@cs.wisc.edu writes:

+ unsigned int xfer_len = blk_rq_bytes(scmd-request);


Mike Can you do bidi and dif/dix?

Nope.


Correction: at the moment.

There is a proposal of READ GATHERED command, which is bidirectional and potentially 
DIF/DIX.


Vlad


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST 2.2 pre-release freeze

2014-05-21 Thread Vladislav Bolkhovitin
Hi All,

I'm glad to announce SCST 3.0 pre-release code freeze in the SCST SVN branch 
3.0.x

You can get it by command:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst/branches/3.0.x

It is going to be released after few weeks of testing, if nothing bad found.

SCST is alternative SCSI target stack for Linux. SCST allows creation of 
sophisticated storage devices, which provide advanced functionality, like 
replication, thin provisioning, deduplication, high availability, automatic 
backup, etc. Majority of recently developed SAN appliances, especially higher 
end ones, are SCST based. It might well be that your favorite storage appliance 
running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Thanks to all who made it happen!

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: SCST iSER target driver is available for testing

2014-01-29 Thread Vladislav Bolkhovitin
I'm glad to announce that SCST iSER target driver is available for testing from 
the SCST SVN iser branch. You can download it either by command:

$ svn checkout svn://svn.code.sf.net/p/scst/svn/branches/iser iser-scst-branch

or by clicking on Download Snapshot button on 
http://sourceforge.net/p/scst/svn/HEAD/tree/branches/iser page.

Big thanks to Yan Burman and Mellanox Technologies who developed it!

SCST is SCSI target mode stack for Linux. SCST allows creation of sophisticated 
storage devices, which provide advanced functionality, like replication, thin 
provisioning, deduplication, high availability, automatic backup, etc. Majority 
of recently developed SAN appliances, especially higher end ones, are SCST 
based. It might well be that your favorite storage appliance running SCST in 
the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is there any plan to support 64bit lun in mainline?

2013-10-15 Thread Vladislav Bolkhovitin
Hannes Reinecke, on 10/14/2013 11:01 PM wrote:
 And HBAs like lpfc or qla2xxx even have a fast command abort built
 into the firmware, where the firmware will not even wait for a
 command abort to hit the wire but rather just disable the exchange
 internally and return.

Doing so is asking for data corruption. Aborts intended to cleanup commands on 
the
target, so they could not anyhow badly interact with future commands. Otherwise 
it is
possible that an old WRITE command stuck in some deep corner inside the target, 
be
bypassed by such pseudo-abort and then got released AFTER its LBAs written by 
newer
data, hence overwrite new data by old data.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bypass block layer and Fill SCSI lower layer driver queue

2013-09-27 Thread Vladislav Bolkhovitin
Douglas Gilbert, on 09/18/2013 07:07 AM wrote:
 On 13-09-18 03:58 AM, Jack Wang wrote:
 On 09/18/2013 08:41 AM, Alireza Haghdoost wrote:
 Hi

 I am working on a high throughput and low latency application which
 does not tolerate block layer overhead to send IO request directly to
 fiber channel lower layer SCSI driver. I used to work with libaio but
 currently I am looking for a way to by pass the block layer and send
 SCSI commands from the application layer directly to the SCSI driver
 using /dev/sgX device and ioctl() system call.

 I have noticed that sending IO request through sg device even with
 nonblocking and direct IO flags is quite slow and does not fill up
 lower layer SCSI driver TCQ queue. i.e IO depth or
 /sys/block/sdX/in_flight is always ZERO. Therefore the application
 throughput is even lower that sending IO request through block layer
 with libaio and io_submit() system call. In both cases I used only one
 IO context (or fd) and single threaded.

 Hi Alireza,

 I think what you want is in_flight command scsi dispatch to low level
 device.
 I submit a simple patch to export device_busy

 http://www.spinics.net/lists/linux-scsi/msg68697.html

 I also notice fio sg engine will not fill queue properly, but haven't
 look into deeper.

 Cheers
 Jack

 I have noticed that some well known benchmarking tools like fio does
 not support IO depth for sg devices as well. Therefore, I was
 wondering if it is feasible to bypass block layer and achieve higher
 throughput and lower latency (for sending IO request only).


 Any comment on my issue is highly appreciated.
 
 I'm not sure if this is relevant to your problem but by
 default both the bsg and sg drivers queue at head
 when they inject SCSI commands into the block layer.
 
 The bsg driver has a BSG_FLAG_Q_AT_TAIL flag to change
 that queueing to what may be preferable for your purposes.
 The sg driver could, but does not, support that flag.

Just curious, for how long this counterproductive insert in head is going to 
stay? I
guess, now (almost) nobody can recall why it is so. This behavior makes sg 
interface,
basically, unusable for anything bigger, than sg-utils.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE]: Emulex SCST support for 16Gb/s FC and FCoE CNAs

2013-09-04 Thread Vladislav Bolkhovitin
I'm glad to announce that SCST support for 16Gb/s FC and FCoE Emulex CNAs is now
available as part of the Emulex OneCore Storage SDK tool set based on the 
Emulex SLI-4
API. Support for 16Gb/s Fibre Channel LPe16000 series and FCoE hardware using 
target
mode versions of the OneConnect FCoE CNAs is included. Documented for use with
RHEL/CentOS 6.x based distributions, ocs_fc_scst works with the stable SCST 
2.2.1 as
well as the development versions of 2.2.x and 3.0.x. The driver code and 
documentation
are available on the Emulex web site at:
http://www.emulex.com/products/onecore-storage-software-development-kit/overview.html

Registration is required on the Developer Portal, but this is free.

Questions regarding this driver better to ask via the Developer Portal.

SCST is SCSI target mode stack for Linux. SCST allows creation of sophisticated 
storage
devices, which provide advanced functionality, like replication, thin 
provisioning,
deduplication, high availability, automatic backup, etc. Majority of recently 
developed
SAN appliances, especially higher end ones, are SCST based. It might well be 
that your
favorite storage appliance running SCST in the firmware.

More info about SCST and its modules you can find on: 
http://scst.sourceforge.net

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atomic write T10 standards

2013-07-03 Thread Vladislav Bolkhovitin
Ric Wheeler, on 07/03/2013 11:31 AM wrote:
 Journals are normally big (128MB or so?) - I don't think that this is 
 unique to xfs.
 We're mixing a bunch of concepts here.  The filesystems have a lot of
 different requirements, and atomics are just one small part.

 Creating a new file often uses resources freed by past files.  So
 deleting the old must be ordered against allocating the new.  They are
 really separate atomic units but you can't handle them completely
 independently.

 If our existing journal commit is:

 * write the data blocks for a transaction
 * flush
 * write the commit block for the transaction
 * flush

 Which part of this does and atomic write help?

 We would still need at least:

 * atomic write of data blocks  commit blocks
 * flush

No necessary.

Consider a case, when you are creating many small files in a big directory. 
Consider
that every such operation needs 3 actions: add new directory entry, get free 
space and
write data there. If 1 atomic write (scattered) command is used for each 
operation and
you order them between each other, if needed, in some way, e.g. by using 
ORDERED SCSI
attribute or queue draining, you don't need any intermediate flushes. Only one 
final
flush would be sufficient. In case of crash simply some of the new files would
disappear, but everything would be fully consistent, so the only needed 
recovery
would be to recreate them.

 The catch is that our current flush mechanisms are still pretty brute force 
 and 
 act across either the whole device or in a temporal (everything flushed 
 before 
 this is acked) way.
 
 I still see it would be useful to have the atomic write really be atomic and 
 durable just for that IO - no flush needed.
 
 Can you give a sequence for the use case for the non-durable atomic write 
 that 
 would not need a sync?

See above.

 Can we really trust all devices to make something atomic 
 that is not durable :) ?

Sure, if application allows that and the atomicity property itself is durable, 
why not?

Vlad

P.S. With atomic writes there's no need in a journal, no?
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-29 Thread Vladislav Bolkhovitin
Martin K. Petersen, on 05/28/2013 01:25 PM wrote:
 Vladislav Linux block layer is purely artificial creature slowly
 Vladislav reinventing wheel creating more problems, than solving.
 
 On the contrary. I do think we solve a whole bunch of problems.
 
 
 Vladislav It enforces approach, where often impossible means
 Vladislav impossible in this interface.
 
 I agree we have limitations. I do not agree that all limitations are
 bad. Sometimes it's OK to say no.
 
 
 Vladislav For instance, how about copy offload?  How about atomic
 Vladislav writes?
 
 I'm actively working on copy offload. Nobody appears to be interested in
 atomic writes. Otherwise I'd work on those as well.
 
 
 Vladislav Why was it needed to create special blk integrity interface
 Vladislav with the only end user - SCSI?
 
 Simple. Because we did not want to interleave data and PI 512+8+512+8
 neither in memory, nor at DMA time.

It can similarly be done in SCSI-like interface without need for any middleman.

 Furthermore, the ATA EPP proposal
 was still on the table so I also needed to support ATA.
 
 And finally, NVM Express uses the blk_integrity interface as well.
 
 
 Vladislav The block layer keeps repeating SCSI. So, maybe, after all,
 Vladislav it's better to acknowledge that direct usage of SCSI without
 Vladislav any intermediate layers and translations is more productive?
 Vladislav And for those minors not using SCSI internally, translate
 Vladislav from SCSI to their internal commands? Creating and filling
 Vladislav CDB fields for most cases isn't anyhow harder, than creating
 Vladislav and feeling bio fields.
 
 This is quite possibly the worst idea I have heard all week.
 
 As it stands it's a headache for the disk ULD driver to figure out which
 of the bazillion READ/WRITE variants to send to a SCSI/ATA device. What
 makes you think that an application or filesystem would be better
 equipped to make that call?
 
 See also: WRITE SAME w/ zeroes vs. WRITE SAME w/ UNMAP vs. UNMAP 
 
 See also: EXTENDED COPY vs. the PROXY command set
 
 See also: USB-ATA bridge chips
 
 You make it sound like all the block layer does is filling out
 CDBs. Which it doesn't in fact have anything to do with at all.
 
 When you are talking about CDBs we're down in the SBC/SSC territory.
 Which is such a tiny bit of what's going on. We have transports, we have
 SAM, we have HBA controller DMA constraints, system DMA constraints,
 buffer bouncing, etc. There's a ton of stuff that needs to happen before
 the CDB and the data physically reach the storage.
 
 You seem to be advocating that everything up to the point where the
 device receives the command is in the way. Well, by all means. Why limit
 ourselves to the confines of SCSI? Why not get rid of POSIX
 read()/write(), page cache, filesystems and let applications speak
 ST-506 directly?
 
 I know we're doing different things. My job is to make a general purpose
 operating system with interfaces that make sense to normal applications.
 That does not preclude special cases where it may make sense to poke at
 the device directly. For testing purposes, for instance. But I consider
 it a failure when we start having applications that know about hardware
 intricacies, cylinders/heads/sectors, etc. That road leads straight to
 the 1980s...

What you mean is true, but my point is that this abstraction is better to be 
done in
SCSI, i.e. SAM, manner. Now need to write fields inside of CDBs, it would be 
pretty
inconvenient ;). But CDBs fields can be fields in some scsi_io structure. Exact 
opcodes
can be easily abstracted to be filled on the last stage, where end CDB is 
constructed
from those fields.

Problem with block abstraction is that it is the least common denominator of 
all block
devices capabilities, hence advanced capabilities, available only some class of
devices, are automatically become impossible. Hence, it would be more 
productive
instead to use the most capable abstraction, which is SAM. In this abstraction 
there's
no need to reinvent complex interfaces and write complex middleman code for 
every
advanced capability. All advanced capabilities there are available by 
definition, if
supported by underlying hardware. That's my point.

POSIX is for simple applications, for which read()/write() calls are 
sufficient. They
are outside of our discussions. But advanced applications need more. I know 
plenty of
applications issuing direct SCSI commands, but how many can you name 
applications using
block interface (bsg)? I can recall only one quite relatively used Linux 
specific
library. That's all. This interface is not demanded by applications.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: PING^7 (was Re: [PATCH v2 00/14] Corrections and customization of the SG_IO command whitelist (CVE-2012-4542))

2013-05-24 Thread Vladislav Bolkhovitin
Martin K. Petersen, on 05/22/2013 09:32 AM wrote:
 Paolo First of all, I'll note that SG_IO and block-device-specific
 Paolo ioctls both have their place.  My usecase for SG_IO is
 Paolo virtualization, where I need to pass information from the LUN to
 Paolo the virtual machine with as much fidelity as possible if I choose
 Paolo to virtualize at the SCSI level.  
 
 Now there's your problem! Several people told you way back that the SCSI
 virt approach was a really poor choice. The SG_IO permissions problem is
 a classic Doctor, it hurts when I do this.
 
 The kernel's fundamental task is to provide abstraction between
 applications and intricacies of hardware. The right way to solve the
 problem would have been to provide a better device abstraction built on
 top of the block/SCSI infrastructure we already have in place. If you
 need more fidelity, add fidelity to the block layer instead of punching
 a giant hole through it.
 
 I seem to recall that reservations were part of your motivation for
 going the SCSI route in the first place. A better approach would have
 been to create a generic reservations mechanism that could be exposed to
 the guest. And then let the baremetal kernel worry about the appropriate
 way to communicate with the physical hardware. Just like we've done with
 reads and writes, discard, write same, etc.

Well, any abstraction is good only if it isn't artificial, so solving more 
problems
than creating.

Reality is that de facto in the industry _SCSI_ is the abstraction for 
block/direct
access to data. Look around. How many of systems around you after all layers 
end up to
SCSI commands in their storage devices?

Linux block layer is purely artificial creature slowly reinventing wheel 
creating more
problems, than solving. It enforces approach, where often impossible means
impossible in this interface. For instance, how about copy offload? How about
reservations? How about atomic writes? Look at history of barriers and compare 
then
with what can be done in SCSI. It's still worse, because doesn't allow usage of 
all
devices capabilities. Why was it needed to create special blk integrity 
interface with
the only end user - SCSI? Artificial task created - then well solved. Etc, etc.

The block layer keeps repeating SCSI. So, maybe, after all, it's better to 
acknowledge
that direct usage of SCSI without any intermediate layers and translations is 
more
productive? And for those minors not using SCSI internally, translate from SCSI 
to
their internal commands? Creating and filling CDB fields for most cases isn't 
anyhow
harder, than creating and feeling bio fields.

So, I appreciate work Paolo is doing in this direction. At least, the right 
thing will
be on the virtualization level.

I do understand that with all existing baggage replacing block layer by SCSI 
isn't
practical and not proposing it, but let's at least acknowledge limitations of 
the
academic block abstraction. Let's don't make those limitations global walls. 
Many
things better to do using direct SCSI, hence let's do the better way.

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6] target/file: Re-enable optional fd_buffered_io=1 operation

2012-10-02 Thread Vladislav Bolkhovitin

Christoph Hellwig, on 10/01/2012 04:46 AM wrote:

On Sun, Sep 30, 2012 at 05:58:11AM +, Nicholas A. Bellinger wrote:

From: Nicholas Bellingern...@linux-iscsi.org

This patch re-adds the ability to optionally run in buffered FILEIO mode
(eg: w/o O_DSYNC) for device backends in order to once again use the
Linux buffered cache as a write-back storage mechanism.

This difference with this patch is that fd_create_virtdevice() now
forces the explicit setting of emulate_write_cache=1 when buffered FILEIO
operation has been enabled.


What this lacks is a clear reason why you would enable this inherently
unsafe mode.  While there is some clear precedence to allow people doing
stupid thing I'd least like a rationale for it, and it being documented
as unsafe.


Nowadays nearly all serious applications are transactional, and know how to flush 
storage cache between transactions. That means that write back caching is 
absolutely safe for them. No data can't be lost in any circumstances.


Welcome to the 21 century, Christoph!

Vlad
--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-11 Thread Vladislav Bolkhovitin

Luben Tuikov wrote:

Is there an open iSCSI Target implementation which


does NOT


issue commands to sub-target devices via the SCSI


mid-layer, but


bypasses it completely?


What do you mean? To call directly low level backstorage
SCSI drivers 
queuecommand() routine? What are advantages of it?


Yes, that's what I meant.  Just curious.


What's advantage of it?


Thanks,
   Luben

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-08 Thread Vladislav Bolkhovitin

[EMAIL PROTECTED] wrote:

On Thu, 7 Feb 2008, Vladislav Bolkhovitin wrote:


Bart Van Assche wrote:


- It has been discussed which iSCSI target implementation should be in
the mainstream Linux kernel. There is no agreement on this subject
yet. The short-term options are as follows:
1) Do not integrate any new iSCSI target implementation in the
mainstream Linux kernel.
2) Add one of the existing in-kernel iSCSI target implementations to
the kernel, e.g. SCST or PyX/LIO.
3) Create a new in-kernel iSCSI target implementation that combines
the advantages of the existing iSCSI kernel target implementations
(iETD, STGT, SCST and PyX/LIO).

As an iSCSI user, I prefer option (3). The big question is whether the
various storage target authors agree with this ?



I tend to agree with some important notes:

1. IET should be excluded from this list, iSCSI-SCST is IET updated 
for SCST framework with a lot of bugfixes and improvements.


2. I think, everybody will agree that Linux iSCSI target should work 
over some standard SCSI target framework. Hence the choice gets 
narrower: SCST vs STGT. I don't think there's a way for a dedicated 
iSCSI target (i.e. PyX/LIO) in the mainline, because of a lot of code 
duplication. Nicholas could decide to move to either existing 
framework (although, frankly, I don't think there's a possibility for 
in-kernel iSCSI target and user space SCSI target framework) and if he 
decide to go with SCST, I'll be glad to offer my help and support and 
wouldn't care if LIO-SCST eventually replaced iSCSI-SCST. The better 
one should win.



why should linux as an iSCSI target be limited to passthrough to a SCSI 
device.


the most common use of this sort of thing that I would see is to load up 
a bunch of 1TB SATA drives in a commodity PC, run software RAID, and 
then export the resulting volume to other servers via iSCSI. not a 
'real' SCSI device in sight.


As far as how good a standard iSCSI is, at this point I don't think it 
really matters. There are too many devices and manufacturers out there 
that implement iSCSI as their storage protocol (from both sides, 
offering storage to other systems, and using external storage). 
Sometimes the best technology doesn't win, but Linux should be 
interoperable with as much as possible and be ready to support the 
winners and the loosers in technology options, for as long as anyone 
chooses to use the old equipment (after all, we support things like 
Arcnet networking, which lost to Ethernet many years ago)


David, your question surprises me a lot. From where have you decided 
that SCST supports only pass-through backstorage? Does the RAM disk, 
which Bart has been using for performance tests, look like a SCSI device?


SCST supports all backstorage types you can imagine and Linux kernel 
supports.



David Lang
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-08 Thread Vladislav Bolkhovitin

Luben Tuikov wrote:

Is there an open iSCSI Target implementation which does NOT
issue commands to sub-target devices via the SCSI mid-layer, but
bypasses it completely?


What do you mean? To call directly low level backstorage SCSI drivers 
queuecommand() routine? What are advantages of it?



   Luben

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-08 Thread Vladislav Bolkhovitin

Nicholas A. Bellinger wrote:

On Thu, 2008-02-07 at 12:37 -0800, Luben Tuikov wrote:


Is there an open iSCSI Target implementation which does NOT
issue commands to sub-target devices via the SCSI mid-layer, but
bypasses it completely?

  Luben




Hi Luben,

I am guessing you mean futher down the stack, which I don't know this to
be the case.  Going futher up the layers is the design of v2.9 LIO-SE.
There is a diagram explaining the basic concepts from a 10,000 foot
level.

http://linux-iscsi.org/builds/user/nab/storage-engine-concept.pdf

Note that only traditional iSCSI target is currently implemented in v2.9
LIO-SE codebase in the list of target mode fabrics on left side of the
layout.  The API between the protocol headers that does
encoding/decoding target mode storage packets is probably the least
mature area of the LIO stack (because it has always been iSCSI looking
towards iSER :).  I don't know who has the most mature API between the
storage engine and target storage protocol for doing this between SCST
and STGT, I am guessing SCST because of the difference in age of the
projects.  Could someone be so kind to fill me in on this..?


SCST uses scsi_execute_async_fifo() function to submit commands to SCSI 
devices in the pass-through mode. This function is slightly modified 
version of scsi_execute_async(), which submits requests in FIFO order 
instead of LIFO as scsi_execute_async() does (so with 
scsi_execute_async() they are executed in the reverse order). 
Scsi_execute_async_fifo() added as a separate patch to the kernel.



Also note, the storage engine plugin for doing userspace passthrough on
the right is also currently not implemented.  Userspace passthrough in
this context is an target engine I/O that is enforcing max_sector and
sector_size limitiations, and encodes/decodes target storage protocol
packets all out of view of userspace.  The addressing will be completely
different if we are pointing SE target packets at non SCSI target ports
in userspace.

--nab

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-08 Thread Vladislav Bolkhovitin

Nicholas A. Bellinger wrote:

- It has been discussed which iSCSI target implementation should be in
the mainstream Linux kernel. There is no agreement on this subject
yet. The short-term options are as follows:
1) Do not integrate any new iSCSI target implementation in the
mainstream Linux kernel.
2) Add one of the existing in-kernel iSCSI target implementations to
the kernel, e.g. SCST or PyX/LIO.
3) Create a new in-kernel iSCSI target implementation that combines
the advantages of the existing iSCSI kernel target implementations
(iETD, STGT, SCST and PyX/LIO).

As an iSCSI user, I prefer option (3). The big question is whether the
various storage target authors agree with this ?


I tend to agree with some important notes:

1. IET should be excluded from this list, iSCSI-SCST is IET updated for SCST 
framework with a lot of bugfixes and improvements.


2. I think, everybody will agree that Linux iSCSI target should work over 
some standard SCSI target framework. Hence the choice gets narrower: SCST vs 
STGT. I don't think there's a way for a dedicated iSCSI target (i.e. PyX/LIO) 
in the mainline, because of a lot of code duplication. Nicholas could decide 
to move to either existing framework (although, frankly, I don't think 
there's a possibility for in-kernel iSCSI target and user space SCSI target 
framework) and if he decide to go with SCST, I'll be glad to offer my help 
and support and wouldn't care if LIO-SCST eventually replaced iSCSI-SCST. The 
better one should win.


why should linux as an iSCSI target be limited to passthrough to a SCSI 
device.


nod

I don't think anyone is saying it should be.  It makes sense that the
more mature SCSI engines that have working code will be providing alot
of the foundation as we talk about options..


From comparing the designs of SCST and LIO-SE, we know that SCST has

supports very SCSI specific target mode hardware, including software
target mode forks of other kernel code.  This code for the target mode
pSCSI, FC and SAS control paths (more for the state machines, that CDB
emulation) that will most likely never need to be emulated on non SCSI
target engine.


...but required for SCSI. So, it must be, anyway.


SCST has support for the most SCSI fabric protocols of
the group (although it is lacking iSER) while the LIO-SE only supports
traditional iSCSI using Linux/IP (this means TCP, SCTP and IPv6).  The
design of LIO-SE was to make every iSCSI initiator that sends SCSI CDBs
and data to talk to every potential device in the Linux storage stack on
the largest amount of hardware architectures possible.

Most of the iSCSI Initiators I know (including non Linux) do not rely on
heavy SCSI task management, and I think this would be a lower priority
item to get real SCSI specific recovery in the traditional iSCSI target
for users.  Espically things like SCSI target mode queue locking
(affectionally called Auto Contingent Allegiance) make no sense for
traditional iSCSI or iSER, because CmdSN rules are doing this for us.


Sorry, it isn't correct. ACA provides possibility to lock commands queue 
in case of CHECK CONDITION, so allows to keep commands execution order 
in case of errors. CmdSN keeps commands execution order only in case of 
success, in case of error the next queued command will be executed 
immediately after the failed one, although application might require to 
have all subsequent after the failed one commands aborted. Think about 
journaled file systems, for instance. Also ACA allows to retry the 
failed command and then resume the queue.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-07 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

Since the focus of this thread shifted somewhat in the last few
messages, I'll try to summarize what has been discussed so far:
- There was a number of participants who joined this discussion
spontaneously. This suggests that there is considerable interest in
networked storage and iSCSI.
- It has been motivated why iSCSI makes sense as a storage protocol
(compared to ATA over Ethernet and Fibre Channel over Ethernet).
- The direct I/O performance results for block transfer sizes below 64
KB are a meaningful benchmark for storage target implementations.
- It has been discussed whether an iSCSI target should be implemented
in user space or in kernel space. It is clear now that an
implementation in the kernel can be made faster than a user space
implementation (http://kerneltrap.org/mailarchive/linux-kernel/2008/2/4/714804).
Regarding existing implementations, measurements have a.o. shown that
SCST is faster than STGT (30% with the following setup: iSCSI via
IPoIB and direct I/O block transfers with a size of 512 bytes).
- It has been discussed which iSCSI target implementation should be in
the mainstream Linux kernel. There is no agreement on this subject
yet. The short-term options are as follows:
1) Do not integrate any new iSCSI target implementation in the
mainstream Linux kernel.
2) Add one of the existing in-kernel iSCSI target implementations to
the kernel, e.g. SCST or PyX/LIO.
3) Create a new in-kernel iSCSI target implementation that combines
the advantages of the existing iSCSI kernel target implementations
(iETD, STGT, SCST and PyX/LIO).

As an iSCSI user, I prefer option (3). The big question is whether the
various storage target authors agree with this ?


I tend to agree with some important notes:

1. IET should be excluded from this list, iSCSI-SCST is IET updated for 
SCST framework with a lot of bugfixes and improvements.


2. I think, everybody will agree that Linux iSCSI target should work 
over some standard SCSI target framework. Hence the choice gets 
narrower: SCST vs STGT. I don't think there's a way for a dedicated 
iSCSI target (i.e. PyX/LIO) in the mainline, because of a lot of code 
duplication. Nicholas could decide to move to either existing framework 
(although, frankly, I don't think there's a possibility for in-kernel 
iSCSI target and user space SCSI target framework) and if he decide to 
go with SCST, I'll be glad to offer my help and support and wouldn't 
care if LIO-SCST eventually replaced iSCSI-SCST. The better one should win.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-06 Thread Vladislav Bolkhovitin

James Bottomley wrote:

On Tue, 2008-02-05 at 21:59 +0300, Vladislav Bolkhovitin wrote:


Hmm, how can one write to an mmaped page and don't touch it?


I meant from user space ... the writes are done inside the kernel.


Sure, the mmap() approach agreed to be unpractical, but could you 
elaborate more on this anyway, please? I'm just curious. Do you think 
about implementing a new syscall, which would put pages with data in the 
mmap'ed area?


No, it has to do with the way invalidation occurs.  When you mmap a
region from a device or file, the kernel places page translations for
that region into your vm_area.  The regions themselves aren't backed
until faulted.  For write (i.e. incoming command to target) you specify
the write flag and send the area off to receive the data.  The gather,
expecting the pages to be overwritten, backs them with pages marked
dirty but doesn't fault in the contents (unless it already exists in the
page cache).  The kernel writes the data to the pages and the dirty
pages go back to the user.  msync() flushes them to the device.

The disadvantage of all this is that the handle for the I/O if you will
is a virtual address in a user process that doesn't actually care to see
the data. non-x86 architectures will do flushes/invalidates on this
address space as the I/O occurs.


I more or less see, thanks. But (1) pages still needs to be mmaped to 
the user space process before the data transmission, i.e. they must be 
zeroed before being mmaped, which isn't much faster, than data copy, and 
(2) I suspect, it would be hard to make it race free, e.g. if another 
process would want to write to the same area simultaneously



However, as Linus has pointed out, this discussion is getting a bit off
topic. 


No, that isn't off topic. We've just proved that there is no good way to 
implement zero-copy cached I/O for STGT. I see the only practical way 
for that, proposed by FUJITA Tomonori some time ago: duplicating Linux 
page cache in the user space. But will you like it?


Well, there's no real evidence that zero copy or lack of it is a problem
yet.


The performance improvement from zero copy can be easily estimated, 
knowing the link throughput and data copy throughput, which are about 
the same for 20Gbps links (I did that few e-mail ago).


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

Erez Zilber wrote:

Bart Van Assche wrote:


As you probably know there is a trend in enterprise computing towards
networked storage. This is illustrated by the emergence during the
past few years of standards like SRP (SCSI RDMA Protocol), iSCSI
(Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different
pieces of software are necessary to make networked storage possible:
initiator software and target software. As far as I know there exist
three different SCSI target implementations for Linux:
- The iSCSI Enterprise Target Daemon (IETD,
http://iscsitarget.sourceforge.net/);
- The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/);
- The Generic SCSI Target Middle Level for Linux project (SCST,
http://scst.sourceforge.net/).
Since I was wondering which SCSI target software would be best suited
for an InfiniBand network, I started evaluating the STGT and SCST SCSI
target implementations. Apparently the performance difference between
STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks,
but the SCST target software outperforms the STGT software on an
InfiniBand network. See also the following thread for the details:
http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.comforum_name=scst-devel.

 


Sorry for the late response (but better late than never).

One may claim that STGT should have lower performance than SCST because
its data path is from userspace. However, your results show that for
non-IB transports, they both show the same numbers. Furthermore, with IB
there shouldn't be any additional difference between the 2 targets
because data transfer from userspace is as efficient as data transfer
from kernel space.


And now consider if one target has zero-copy cached I/O. How much that 
will improve its performance?



The only explanation that I see is that fine tuning for iSCSI  iSER is
required. As was already mentioned in this thread, with SDR you can get
~900 MB/sec with iSER (on STGT).

Erez

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

Jeff Garzik wrote:
iSCSI is way, way too complicated. 


I fully agree. From one side, all that complexity is unavoidable for 
case of multiple connections per session, but for the regular case of 
one connection per session it must be a lot simpler.


Actually, think about those multiple connections...  we already had to 
implement fast-failover (and load bal) SCSI multi-pathing at a higher 
level.  IMO that portion of the protocol is redundant:   You need the 
same capability elsewhere in the OS _anyway_, if you are to support 
multi-pathing.


I'm thinking about MC/S as about a way to improve performance using 
several physical links. There's no other way, except MC/S, to keep 
commands processing order in that case. So, it's really valuable 
property of iSCSI, although with a limited application.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

Jeff Garzik wrote:

Alan Cox wrote:

better. So for example, I personally suspect that ATA-over-ethernet is way 
better than some crazy SCSI-over-TCP crap, but I'm biased for simple and 
low-level, and against those crazy SCSI people to begin with.


Current ATAoE isn't. It can't support NCQ. A variant that did NCQ and IP
would probably trash iSCSI for latency if nothing else.



AoE is truly a thing of beauty.  It has a two/three page RFC (say no more!).

But quite so...  AoE is limited to MTU size, which really hurts.  Can't 
really do tagged queueing, etc.



iSCSI is way, way too complicated. 


I fully agree. From one side, all that complexity is unavoidable for 
case of multiple connections per session, but for the regular case of 
one connection per session it must be a lot simpler.


And now think about iSER, which brings iSCSI on the whole new complexity 
level ;)


It's an Internet protocol designed 
by storage designers, what do you expect?


For years I have been hoping that someone will invent a simple protocol 
(w/ strong auth) that can transit ATA and SCSI commands and responses. 
Heck, it would be almost trivial if the kernel had a TLS/SSL implementation.


Jeff

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

Linus Torvalds wrote:

I'd assumed the move was primarily because of the difficulty of getting
correct semantics on a shared filesystem



.. not even shared. It was hard to get correct semantics full stop. 

Which is a traditional problem. The thing is, the kernel always has some 
internal state, and it's hard to expose all the semantics that the kernel 
knows about to user space.


So no, performance is not the only reason to move to kernel space. It can 
easily be things like needing direct access to internal data queues (for a 
iSCSI target, this could be things like barriers or just tagged commands - 
yes, you can probably emulate things like that without access to the 
actual IO queues, but are you sure the semantics will be entirely right?


The kernel/userland boundary is not just a performance boundary, it's an 
abstraction boundary too, and these kinds of protocols tend to break 
abstractions. NFS broke it by having file handles (which is not 
something that really exists in user space, and is almost impossible to 
emulate correctly), and I bet the same thing happens when emulating a SCSI 
target in user space.


Yes, there is something like that for SCSI target as well. It's a local 
initiator or local nexus, see 
http://thread.gmane.org/gmane.linux.scsi/31288 and 
http://news.gmane.org/find-root.php?message_id=%3c463F36AC.3010207%40vlnb.net%3e 
for more info about that.


In fact, existence of local nexus is one more point why SCST is better, 
than STGT, because for STGT it's pretty hard to support it (all locally 
generated commands would have to be passed through its daemon, which 
would be a total disaster for performance), while for SCST it can be 
done relatively simply.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

James Bottomley wrote:

On Mon, 2008-02-04 at 21:38 +0300, Vladislav Bolkhovitin wrote:


James Bottomley wrote:


On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:



James Bottomley wrote:



On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:




James Bottomley wrote:



So, James, what is your opinion on the above? Or the overall SCSI target 
project simplicity doesn't matter much for you and you think it's fine 
to duplicate Linux page cache in the user space to keep the in-kernel 
part of the project as small as possible?



The answers were pretty much contained here

http://marc.info/?l=linux-scsim=120164008302435

and here:

http://marc.info/?l=linux-scsim=120171067107293

Weren't they?


No, sorry, it doesn't look so for me. They are about performance, but 
I'm asking about the overall project's architecture, namely about one 
part of it: simplicity. Particularly, what do you think about 
duplicating Linux page cache in the user space to have zero-copy cached 
I/O? Or can you suggest another architectural solution for that problem 
in the STGT's approach?



Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.


No, please reread this thread, especially this message: 
http://marc.info/?l=linux-kernelm=120169189504361w=2. This is one of 
the advantages of the kernel space implementation. The user space 
implementation has to have data copied between the cache and user space 
buffer, but the kernel space one can use pages in the cache directly, 
without extra copy.



Well, you've said it thrice (the bellman cried) but that doesn't make it
true.

The way a user space solution should work is to schedule mmapped I/O



from the backing store and then send this mmapped region off for target



I/O.  For reads, the page gather will ensure that the pages are up to
date from the backing store to the cache before sending the I/O out.
For writes, You actually have to do a msync on the region to get the
data secured to the backing store. 


James, have you checked how fast is mmaped I/O if work size  size of 
RAM? It's several times slower comparing to buffered I/O. It was many 
times discussed in LKML and, seems, VM people consider it unavoidable. 



Erm, but if you're using the case of work size  size of RAM, you'll
find buffered I/O won't help because you don't have the memory for
buffers either.


James, just check and you will see, buffered I/O is a lot faster.


So in an out of memory situation the buffers you don't have are a lot
faster than the pages I don't have?


There isn't OOM in both cases. Just pages reclamation/readahead work 
much better in the buffered case.


So, using mmaped IO isn't an option for high performance. Plus, mmaped 
IO isn't an option for high reliability requirements, since it doesn't 
provide a practical way to handle I/O errors.


I think you'll find it does ... the page gather returns -EFAULT if
there's an I/O error in the gathered region. 


Err, to whom return? If you try to read from a mmaped page, which can't 
be populated due to I/O error, you will get SIGBUS or SIGSEGV, I don't 
remember exactly. It's quite tricky to get back to the faulted command 
from the signal handler.


Or do you mean mmap(MAP_POPULATE)/munmap() for each command? Do you 
think that such mapping/unmapping is good for performance?




msync does something
similar if there's a write failure.



You also have to pull tricks with
the mmap region in the case of writes to prevent useless data being read
in from the backing store.


Can you be more exact and specify what kind of tricks should be done for 
that?


Actually, just avoid touching it seems to do the trick with a recent
kernel.


Hmm, how can one write to an mmaped page and don't touch it?


I meant from user space ... the writes are done inside the kernel.


Sure, the mmap() approach agreed to be unpractical, but could you 
elaborate more on this anyway, please? I'm just curious. Do you think 
about implementing a new syscall, which would put pages with data in the 
mmap'ed area?



However, as Linus has pointed out, this discussion is getting a bit off
topic. 


No, that isn't off topic. We've just proved that there is no good way to 
implement zero-copy cached I/O for STGT. I see the only practical way 
for that, proposed by FUJITA Tomonori some time ago: duplicating Linux 
page cache in the user space. But will you like it?



There's no actual evidence that copy problems are causing any
performatince issues issues for STGT.  In fact, there's evidence that
they're not for everything except IB networks.


The zero-copy cached I/O has not yet been implemented in SCST, I simply 
so far have not had time for that. Currently SCST performs better STGT, 
because of simpler processing path and less context switches per 
command. Memcpy() speed on modern systems is about

Re: Integration of SCST in the mainstream Linux kernel

2008-02-05 Thread Vladislav Bolkhovitin

Linus Torvalds wrote:
So just going by what has happened in the past, I'd assume that iSCSI 
would eventually turn into connecting/authentication in user space with 
data transfers in kernel space.


This is exactly how iSCSI-SCST (iSCSI target driver for SCST) is 
implemented, credits to IET and Ardis target developers.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Feb 4, 2008 1:27 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:


So, James, what is your opinion on the above? Or the overall SCSI target
project simplicity doesn't matter much for you and you think it's fine
to duplicate Linux page cache in the user space to keep the in-kernel
part of the project as small as possible?



It's too early to draw conclusions about performance. I'm currently
performing more measurements, and the results are not easy to
interpret. My plan is to measure the following:
* Setup: target with RAM disk of 2 GB as backing storage.
* Throughput reported by dd and xdd (direct I/O).
* Transfers with dd/xdd in units of 1 KB to 1 GB (the smallest
transfer size that can be specified to xdd is 1 KB).
* Target SCSI software to be tested: IETD iSCSI via IPoIB, STGT iSCSI
via IPoIB, STGT iSER, SCST iSCSI via IPoIB, SCST SRP, LIO iSCSI via
IPoIB.

The reason I chose dd/xdd for these tests is that I want to measure
the performance of the communication protocols, and that I am assuming
that this performance can be modeled by the following formula:
(transfer time in s) = (transfer setup latency in s) + (transfer size
in MB) / (bandwidth in MB/s).


It isn't fully correct, you forgot about link latency. More correct one is:

(transfer time) = (transfer setup latency on both initiator and target, 
consisting from software processing time, including memory copy, if 
necessary, and PCI setup/transfer time) + (transfer size)/(bandwidth) + 
(link latency to deliver request for READs or status for WRITES) + 
(2*(link latency) to deliver R2T/XFER_READY request in case of WRITEs, 
if necessary (e.g. iSER for small transfers might not need it, but SRP 
most likely always needs it)). Also you should note that it's correct 
only in case of single threaded workloads with one outstanding command 
at time. For other workloads it depends from how well they manage to 
keep the link full in interval from (transfer size)/(transfer time) to 
bandwidth.



Measuring the time needed for transfers
with varying block size allows to compute the constants in the above
formula via linear regression.


Unfortunately, it isn't so easy, see above.


One difficulty I already encountered is that the performance of the
Linux IPoIB implementation varies a lot under high load
(http://bugzilla.kernel.org/show_bug.cgi?id=9883).

Another issue I have to look further into is that dd and xdd report
different results for very large block sizes ( 1 MB).


Look at /proc/scsi_tgt/sgv (for SCST) and you will see, which transfer 
sizes are actually used. Initiators don't like sending big requests and 
often split them on smaller ones.


Look at this message as well, it might be helpful: 
http://lkml.org/lkml/2007/5/16/223



Bart Van Assche.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Vladislav Bolkhovitin

James Bottomley wrote:
So, James, what is your opinion on the above? Or the overall SCSI target 
project simplicity doesn't matter much for you and you think it's fine 
to duplicate Linux page cache in the user space to keep the in-kernel 
part of the project as small as possible?



The answers were pretty much contained here

http://marc.info/?l=linux-scsim=120164008302435

and here:

http://marc.info/?l=linux-scsim=120171067107293

Weren't they?


No, sorry, it doesn't look so for me. They are about performance, but 
I'm asking about the overall project's architecture, namely about one 
part of it: simplicity. Particularly, what do you think about 
duplicating Linux page cache in the user space to have zero-copy cached 
I/O? Or can you suggest another architectural solution for that problem 
in the STGT's approach?



Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.


No, please reread this thread, especially this message: 
http://marc.info/?l=linux-kernelm=120169189504361w=2. This is one of 
the advantages of the kernel space implementation. The user space 
implementation has to have data copied between the cache and user space 
buffer, but the kernel space one can use pages in the cache directly, 
without extra copy.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Vladislav Bolkhovitin

James Bottomley wrote:

On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:


James Bottomley wrote:

So, James, what is your opinion on the above? Or the overall SCSI target 
project simplicity doesn't matter much for you and you think it's fine 
to duplicate Linux page cache in the user space to keep the in-kernel 
part of the project as small as possible?



The answers were pretty much contained here

http://marc.info/?l=linux-scsim=120164008302435

and here:

http://marc.info/?l=linux-scsim=120171067107293

Weren't they?


No, sorry, it doesn't look so for me. They are about performance, but 
I'm asking about the overall project's architecture, namely about one 
part of it: simplicity. Particularly, what do you think about 
duplicating Linux page cache in the user space to have zero-copy cached 
I/O? Or can you suggest another architectural solution for that problem 
in the STGT's approach?



Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.


No, please reread this thread, especially this message: 
http://marc.info/?l=linux-kernelm=120169189504361w=2. This is one of 
the advantages of the kernel space implementation. The user space 
implementation has to have data copied between the cache and user space 
buffer, but the kernel space one can use pages in the cache directly, 
without extra copy.



Well, you've said it thrice (the bellman cried) but that doesn't make it
true.

The way a user space solution should work is to schedule mmapped I/O
from the backing store and then send this mmapped region off for target
I/O.  For reads, the page gather will ensure that the pages are up to
date from the backing store to the cache before sending the I/O out.
For writes, You actually have to do a msync on the region to get the
data secured to the backing store. 


James, have you checked how fast is mmaped I/O if work size  size of 
RAM? It's several times slower comparing to buffered I/O. It was many 
times discussed in LKML and, seems, VM people consider it unavoidable. 
So, using mmaped IO isn't an option for high performance. Plus, mmaped 
IO isn't an option for high reliability requirements, since it doesn't 
provide a practical way to handle I/O errors.



You also have to pull tricks with
the mmap region in the case of writes to prevent useless data being read
in from the backing store.


Can you be more exact and specify what kind of tricks should be done for 
that?



 However, none of this involves data copies.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Vladislav Bolkhovitin

Vladislav Bolkhovitin wrote:

James Bottomley wrote:


The two target architectures perform essentially identical functions, so
there's only really room for one in the kernel.  Right at the moment,
it's STGT.  Problems in STGT come from the user-kernel boundary which
can be mitigated in a variety of ways.  The fact that the figures are
pretty much comparable on non IB networks shows this.

I really need a whole lot more evidence than at worst a 20% performance
difference on IB to pull one implementation out and replace it with
another.  Particularly as there's no real evidence that STGT can't be
tweaked to recover the 20% even on IB.



James,

Although the performance difference between STGT and SCST is apparent, 
this isn't the only point why SCST is better. I've already written about 
it many times in various mailing lists, but let me summarize it one more 
time here.


As you know, almost all kernel parts can be done in user space, 
including all the drivers, networking, I/O management with block/SCSI 
initiator subsystem and disk cache manager. But does it mean that 
currently Linux kernel is bad and all the above should be (re)done in 
user space instead? I believe, not. Linux isn't a microkernel for very 
pragmatic reasons: simplicity and performance. So, additional important 
point why SCST is better is simplicity.


For SCSI target, especially with hardware target card, data are came 
from kernel and eventually served by kernel, which does actual I/O or 
getting/putting data from/to cache. Dividing requests processing between 
user and kernel space creates unnecessary interface layer(s) and 
effectively makes the requests processing job distributed with all its 
complexity and reliability problems. From my point of view, having such 
distribution, where user space is master side and kernel is slave is 
rather wrong, because:


1. It makes kernel depend from user program, which services it and 
provides for it its routines, while the regular paradigm is the 
opposite: kernel services user space applications. As a direct 
consequence from it that there is no real protection for the kernel from 
faults in the STGT core code without excessive effort, which, no 
surprise, wasn't currently done and, seems, is never going to be done. 
So, on practice debugging and developing under STGT isn't easier, than 
if the whole code was in the kernel space, but, actually, harder (see 
below why).


2. It requires new complicated interface between kernel and user spaces 
that creates additional maintenance and debugging headaches, which don't 
exist for kernel only code. Linus Torvalds some time ago perfectly 
described why it is bad, see http://lkml.org/lkml/2007/4/24/451, 
http://lkml.org/lkml/2006/7/1/41 and http://lkml.org/lkml/2007/4/24/364.


3. It makes for SCSI target impossible to use (at least, on a simple and 
sane way) many effective optimizations: zero-copy cached I/O, more 
control over read-ahead, device queue unplugging-plugging, etc. One 
example of already implemented such features is zero-copy network data 
transmission, done in simple 260 lines put_page_callback patch. This 
optimization is especially important for the user space gate (scst_user 
module), see below for details.


The whole point that development for kernel is harder, than for user 
space, is totally nonsense nowadays. It's different, yes, in some ways 
more limited, yes, but not harder. For ones who need gdb (I for many 
years - don't) kernel has kgdb, plus it also has many not available for 
user space or more limited there debug facilities like lockdep, lockup 
detection, oprofile, etc. (I don't mention wider choice of more 
effectively implemented synchronization primitives and not only them).


For people who need complicated target devices emulation, like, e.g., in 
case of VTL (Virtual Tape Library), where there is a need to operate 
with large mmap'ed memory areas, SCST provides gateway to the user space 
(scst_user module), but, in contrast with STGT, it's done in regular 
kernel - master, user application - slave paradigm, so it's reliable 
and no fault in user space device emulator can break kernel and other 
user space applications. Plus, since SCSI target state machine and 
memory management are in the kernel, it's very effective and allows only 
one kernel-user space switch per SCSI command.


Also, I should note here, that in the current state STGT in many aspects 
doesn't fully conform SCSI specifications, especially in area of 
management events, like Unit Attentions generation and processing, and 
it doesn't look like somebody cares about it. At the same time, SCST 
pays big attention to fully conform SCSI specifications, because price 
of non-conformance is a possible user's data corruption.


Returning to performance, modern SCSI transports, e.g. InfiniBand, have 
as low link latency as 1(!) microsecond. For comparison, the 
inter-thread context switch time on a modern system is about the same, 
syscall time

Re: Integration of SCST in the mainstream Linux kernel

2008-02-04 Thread Vladislav Bolkhovitin

James Bottomley wrote:

On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:


James Bottomley wrote:


On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:



James Bottomley wrote:


So, James, what is your opinion on the above? Or the overall SCSI target 
project simplicity doesn't matter much for you and you think it's fine 
to duplicate Linux page cache in the user space to keep the in-kernel 
part of the project as small as possible?



The answers were pretty much contained here

http://marc.info/?l=linux-scsim=120164008302435

and here:

http://marc.info/?l=linux-scsim=120171067107293

Weren't they?


No, sorry, it doesn't look so for me. They are about performance, but 
I'm asking about the overall project's architecture, namely about one 
part of it: simplicity. Particularly, what do you think about 
duplicating Linux page cache in the user space to have zero-copy cached 
I/O? Or can you suggest another architectural solution for that problem 
in the STGT's approach?



Isn't that an advantage of a user space solution?  It simply uses the
backing store of whatever device supplies the data.  That means it takes
advantage of the existing mechanisms for caching.


No, please reread this thread, especially this message: 
http://marc.info/?l=linux-kernelm=120169189504361w=2. This is one of 
the advantages of the kernel space implementation. The user space 
implementation has to have data copied between the cache and user space 
buffer, but the kernel space one can use pages in the cache directly, 
without extra copy.



Well, you've said it thrice (the bellman cried) but that doesn't make it
true.

The way a user space solution should work is to schedule mmapped I/O
from the backing store and then send this mmapped region off for target
I/O.  For reads, the page gather will ensure that the pages are up to
date from the backing store to the cache before sending the I/O out.
For writes, You actually have to do a msync on the region to get the
data secured to the backing store. 


James, have you checked how fast is mmaped I/O if work size  size of 
RAM? It's several times slower comparing to buffered I/O. It was many 
times discussed in LKML and, seems, VM people consider it unavoidable. 



Erm, but if you're using the case of work size  size of RAM, you'll
find buffered I/O won't help because you don't have the memory for
buffers either.


James, just check and you will see, buffered I/O is a lot faster.

So, using mmaped IO isn't an option for high performance. Plus, mmaped 
IO isn't an option for high reliability requirements, since it doesn't 
provide a practical way to handle I/O errors.


I think you'll find it does ... the page gather returns -EFAULT if
there's an I/O error in the gathered region. 


Err, to whom return? If you try to read from a mmaped page, which can't 
be populated due to I/O error, you will get SIGBUS or SIGSEGV, I don't 
remember exactly. It's quite tricky to get back to the faulted command 
from the signal handler.


Or do you mean mmap(MAP_POPULATE)/munmap() for each command? Do you 
think that such mapping/unmapping is good for performance?



msync does something
similar if there's a write failure.


You also have to pull tricks with
the mmap region in the case of writes to prevent useless data being read
in from the backing store.


Can you be more exact and specify what kind of tricks should be done for 
that?


Actually, just avoid touching it seems to do the trick with a recent
kernel.


Hmm, how can one write to an mmaped page and don't touch it?


James





-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-01 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 31, 2008 5:25 PM, Joe Landman [EMAIL PROTECTED] wrote:


Vladislav Bolkhovitin wrote:


Actually, I don't know what kind of conclusions it is possible to make
from disktest's results (maybe only how throughput gets bigger or slower
with increasing number of threads?), it's a good stress test tool, but
not more.


Unfortunately, I agree.  Bonnie++, dd tests, and a few others seem to
bear far closer to real world tests than disktest and iozone, the
latter of which does more to test the speed of RAM cache and system call
performance than actual IO.



I have ran some tests with Bonnie++, but found out that on a fast
network like IB the filesystem used for the test has a really big
impact on the test results.

If anyone has a suggestion for a better test than dd to compare the
performance of SCSI storage protocols, please let it know.


I would suggest you to try something from real life, like:

 - Copying large file tree over a single or multiple IB links

 - Measure of some DB engine's TPC

 - etc.


Bart Van Assche.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Scst-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/scst-devel



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-01 Thread Vladislav Bolkhovitin

Vladislav Bolkhovitin wrote:

Bart Van Assche wrote:

On Jan 31, 2008 5:25 PM, Joe Landman [EMAIL PROTECTED] 
wrote:



Vladislav Bolkhovitin wrote:


Actually, I don't know what kind of conclusions it is possible to make
from disktest's results (maybe only how throughput gets bigger or 
slower

with increasing number of threads?), it's a good stress test tool, but
not more.



Unfortunately, I agree.  Bonnie++, dd tests, and a few others seem to
bear far closer to real world tests than disktest and iozone, the
latter of which does more to test the speed of RAM cache and system call
performance than actual IO.




I have ran some tests with Bonnie++, but found out that on a fast
network like IB the filesystem used for the test has a really big
impact on the test results.

If anyone has a suggestion for a better test than dd to compare the
performance of SCSI storage protocols, please let it know.



I would suggest you to try something from real life, like:

 - Copying large file tree over a single or multiple IB links

 - Measure of some DB engine's TPC

 - etc.


Forgot to mention. During those tests make sure that imported devices 
from both SCST and STGT report in the kernel log the same write cache 
and FUA capabilities, since they significantly affect initiator's 
behavior. Like:


sd 4:0:0:5: [sdf] Write cache: enabled, read cache: enabled, supports 
DPO and FUA


For SCST the fastest mode is NV_CACHE, refer to its README file for details.


Bart Van Assche.

-
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse012070mrt/direct/01/
___
Scst-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/scst-devel



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Integration of SCST in the mainstream Linux kernel

2008-02-01 Thread Vladislav Bolkhovitin

David Dillow wrote:

On Thu, 2008-01-31 at 18:08 +0100, Bart Van Assche wrote:


If anyone has a suggestion for a better test than dd to compare the
performance of SCSI storage protocols, please let it know.



xdd on /dev/sda, sdb, etc. using -dio to do direct IO seems to work
decently, though it is hard (ie, impossible) to get a repeatable
sequence of IO when using higher queue depths, as it uses threads to
generate multiple requests.


This utility seems to be a good one, but it's basically the same as 
disktest, although much more advanced.



You may also look at sgpdd_survey from Lustre's iokit, but I've not done
much with that -- it uses the sg devices to send lowlevel SCSI commands.


Yes, it might be worth to try. Since fundamentally it's the same as 
O_DIRECT dd, but with a bit less overhead on the initiator side (hence 
less initiator side latency), most likely it will show ever bigger 
difference, than it is with dd.



I've been playing around with some benchmark code using libaio, but it's
not in generally usable shape.

xdd:
http://www.ioperformance.com/products.htm

Lustre IO Kit:
http://manual.lustre.org/manual/LustreManual16_HTML/DynamicHTML-20-1.html


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-01-31 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 31, 2008 2:25 PM, Nicholas A. Bellinger [EMAIL PROTECTED] wrote:


Since this particular code is located in a non-data path critical
section, the kernel vs. user discussion is a wash.  If we are talking
about data path, yes, the relevance of DD tests in kernel designs are
suspect :p.  For those IB testers who are interested, perhaps having a
look with disktest from the Linux Test Project would give a better
comparision between the two implementations on a RDMA capable fabric
like IB for best case performance.  I think everyone is interested in
seeing just how much data path overhead exists between userspace and
kernel space in typical and heavy workloads, if if this overhead can be
minimized to make userspace a better option for some of this very
complex code.


I can run disktest on the same setups I ran dd on. This will take some
time however.


Disktest was already referenced in the beginning of the performance 
comparison thread, but its results are not very interesting if we are 
going to find out, which implementation is more effective, because in 
the modes, in which usually people run this utility, it produces latency 
insensitive workload (multiple threads working in parallel). So, such 
multithreaded disktests results will be different between STGT and SCST 
only if STGT's implementation will get target CPU bound. If CPU on the 
target is powerful enough, even extra busy loops in the STGT or SCST hot 
path code will change nothing.


Additionally, multithreaded disktest over RAM disk is a good example of 
a synthetic benchmark, which has almost no relation with real life 
workloads. But people like it, because it produces nice looking results.


Actually, I don't know what kind of conclusions it is possible to make 
from disktest's results (maybe only how throughput gets bigger or slower 
with increasing number of threads?), it's a good stress test tool, but 
not more.



Disktest is new to me -- any hints with regard to suitable
combinations of command line parameters are welcome. The most recent
version I could find on http://ltp.sourceforge.net/ is ltp-20071231.

Bart Van Assche.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-01-30 Thread Vladislav Bolkhovitin

James Bottomley wrote:

The two target architectures perform essentially identical functions, so
there's only really room for one in the kernel.  Right at the moment,
it's STGT.  Problems in STGT come from the user-kernel boundary which
can be mitigated in a variety of ways.  The fact that the figures are
pretty much comparable on non IB networks shows this.

I really need a whole lot more evidence than at worst a 20% performance
difference on IB to pull one implementation out and replace it with
another.  Particularly as there's no real evidence that STGT can't be
tweaked to recover the 20% even on IB.


James,

Although the performance difference between STGT and SCST is apparent, 
this isn't the only point why SCST is better. I've already written about 
it many times in various mailing lists, but let me summarize it one more 
time here.


As you know, almost all kernel parts can be done in user space, 
including all the drivers, networking, I/O management with block/SCSI 
initiator subsystem and disk cache manager. But does it mean that 
currently Linux kernel is bad and all the above should be (re)done in 
user space instead? I believe, not. Linux isn't a microkernel for very 
pragmatic reasons: simplicity and performance. So, additional important 
point why SCST is better is simplicity.


For SCSI target, especially with hardware target card, data are came 
from kernel and eventually served by kernel, which does actual I/O or 
getting/putting data from/to cache. Dividing requests processing between 
user and kernel space creates unnecessary interface layer(s) and 
effectively makes the requests processing job distributed with all its 
complexity and reliability problems. From my point of view, having such 
distribution, where user space is master side and kernel is slave is 
rather wrong, because:


1. It makes kernel depend from user program, which services it and 
provides for it its routines, while the regular paradigm is the 
opposite: kernel services user space applications. As a direct 
consequence from it that there is no real protection for the kernel from 
faults in the STGT core code without excessive effort, which, no 
surprise, wasn't currently done and, seems, is never going to be done. 
So, on practice debugging and developing under STGT isn't easier, than 
if the whole code was in the kernel space, but, actually, harder (see 
below why).


2. It requires new complicated interface between kernel and user spaces 
that creates additional maintenance and debugging headaches, which don't 
exist for kernel only code. Linus Torvalds some time ago perfectly 
described why it is bad, see http://lkml.org/lkml/2007/4/24/451, 
http://lkml.org/lkml/2006/7/1/41 and http://lkml.org/lkml/2007/4/24/364.


3. It makes for SCSI target impossible to use (at least, on a simple and 
sane way) many effective optimizations: zero-copy cached I/O, more 
control over read-ahead, device queue unplugging-plugging, etc. One 
example of already implemented such features is zero-copy network data 
transmission, done in simple 260 lines put_page_callback patch. This 
optimization is especially important for the user space gate (scst_user 
module), see below for details.


The whole point that development for kernel is harder, than for user 
space, is totally nonsense nowadays. It's different, yes, in some ways 
more limited, yes, but not harder. For ones who need gdb (I for many 
years - don't) kernel has kgdb, plus it also has many not available for 
user space or more limited there debug facilities like lockdep, lockup 
detection, oprofile, etc. (I don't mention wider choice of more 
effectively implemented synchronization primitives and not only them).


For people who need complicated target devices emulation, like, e.g., in 
case of VTL (Virtual Tape Library), where there is a need to operate 
with large mmap'ed memory areas, SCST provides gateway to the user space 
(scst_user module), but, in contrast with STGT, it's done in regular 
kernel - master, user application - slave paradigm, so it's reliable 
and no fault in user space device emulator can break kernel and other 
user space applications. Plus, since SCSI target state machine and 
memory management are in the kernel, it's very effective and allows only 
one kernel-user space switch per SCSI command.


Also, I should note here, that in the current state STGT in many aspects 
doesn't fully conform SCSI specifications, especially in area of 
management events, like Unit Attentions generation and processing, and 
it doesn't look like somebody cares about it. At the same time, SCST 
pays big attention to fully conform SCSI specifications, because price 
of non-conformance is a possible user's data corruption.


Returning to performance, modern SCSI transports, e.g. InfiniBand, have 
as low link latency as 1(!) microsecond. For comparison, the 
inter-thread context switch time on a modern system is about the same, 
syscall time - about 0.1 microsecond. So, 

Re: Integration of SCST in the mainstream Linux kernel

2008-01-30 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Tue, 29 Jan 2008 13:31:52 -0800
Roland Dreier [EMAIL PROTECTED] wrote:



 .   .   STGT read SCST read.STGT read 
 SCST read.
 .   .  performance   performance   . performance
performance   .
 .   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB, MB/s)   
(1 MB, MB/s)  .
 . iSER (8 Gb/s network) . 250N/A   .   360
   N/A   .
 . SRP  (8 Gb/s network) . N/A421   .   N/A
   683   .

 On the comparable figures, which only seem to be IPoIB they're showing a
 13-18% variance, aren't they?  Which isn't an incredible difference.

Maybe I'm all wet, but I think iSER vs. SRP should be roughly
comparable.  The exact formatting of various messages etc. is
different but the data path using RDMA is pretty much identical.  So
the big difference between STGT iSER and SCST SRP hints at some big
difference in the efficiency of the two implementations.



iSER has parameters to limit the maximum size of RDMA (it needs to
repeat RDMA with a poor configuration)?


Anyway, here's the results from Robin Humble:

iSER to 7G ramfs, x86_64, centos4.6, 2.6.22 kernels, git tgtd,
initiator end booted with mem=512M, target with 8G ram

 direct i/o dd
  write/read  800/751 MB/s
dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct

http://www.mail-archive.com/linux-scsi@vger.kernel.org/msg13502.html

I think that STGT is pretty fast with the fast backing storage. 


How fast SCST will be on the same hardware?


I don't think that there is the notable perfornace difference between
kernel-space and user-space SRP (or ISER) implementations about moving
data between hosts. IB is expected to enable user-space applications
to move data between hosts quickly (if not, what can IB provide us?).

I think that the question is how fast user-space applications can do
I/Os ccompared with I/Os in kernel space. STGT is eager for the advent
of good asynchronous I/O and event notification interfances.

One more possible optimization for STGT is zero-copy data
transfer. STGT uses pre-registered buffers and move data between page
cache and thsse buffers, and then does RDMA transfer. If we implement
own caching mechanism to use pre-registered buffers directly with (AIO
and O_DIRECT), then STGT can move data without data copies.


Great! So, you are going to duplicate Linux page cache in the user 
space. You will continue keeping the in-kernel code as small as possible 
and its mainteinership effort as low as possible by the cost that the 
user space part's code size and complexity (and, hence, its 
mainteinership effort) will rocket to the sky. Apparently, this doesn't 
look like a good design decision.



--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-01-30 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Wed, 30 Jan 2008 09:38:04 +0100
Bart Van Assche [EMAIL PROTECTED] wrote:



On Jan 30, 2008 12:32 AM, FUJITA Tomonori [EMAIL PROTECTED] wrote:


iSER has parameters to limit the maximum size of RDMA (it needs to
repeat RDMA with a poor configuration)?


Please specify which parameters you are referring to. As you know I



Sorry, I can't say. I don't know much about iSER. But seems that Pete
and Robin can get the better I/O performance - line speed ratio with
STGT.

The version of OpenIB might matters too. For example, Pete said that
STGT reads loses about 100 MB/s for some transfer sizes for some
transfer sizes due to the OpenIB version difference or other unclear
reasons.

http://article.gmane.org/gmane.linux.iscsi.tgt.devel/135

It's fair to say that it takes long time and need lots of knowledge to
get the maximum performance of SAN, I think.

I think that it would be easier to convince James with the detailed
analysis (e.g. where does it take so long, like Pete did), not just
'dd' performance results.

Pushing iSCSI target code into mainline failed four times: IET, SCST,
STGT (doing I/Os in kernel in the past), and PyX's one (*1). iSCSI
target code is huge. You said SCST comprises 14,000 lines, but it's
not iSCSI target code. The SCSI engine code comprises 14,000
lines. You need another 10,000 lines for the iSCSI driver. Note that
SCST's iSCSI driver provides only basic iSCSI features. PyX's iSCSI
target code implemenents more iSCSI features (like MC/S, ERL2, etc)
and comprises about 60,000 lines and it still lacks some features like
iSER, bidi, etc.

I think that it's reasonable to say that we need more than 'dd'
results before pushing about possible more than 60,000 lines to
mainline.


Tomo, please stop counting in-kernel lines only (see 
http://lkml.org/lkml/2007/4/24/364). The amount of the overall project 
lines for the same feature set is a lot more important.



(*1) http://linux-iscsi.org/



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-24 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 24, 2008 8:06 AM, Robin Humble [EMAIL PROTECTED] wrote:


On Tue, Jan 22, 2008 at 01:32:08PM +0100, Bart Van Assche wrote:


.
.   .   STGT read SCST read.STGT read  
SCST read.
.   .  performance   performance   .   performance
performance   .
.   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB MB/s)   
(1 MB, MB/s)  .
.
. Ethernet (1 Gb/s network) .  77 78   . 77 
   89   .
. IPoIB(8 Gb/s network) . 163185   .201 
  239   .
. iSER (8 Gb/s network) . 250N/A   .360 
  N/A   .
. SRP  (8 Gb/s network) . N/A421   .N/A 
  683   .



how are write speeds with SCST SRP?
for some kernels and tests tgt writes at 2x the read speed.


Robin,

There is a fundamental difference between regular dd-like reads and 
writes: reads are sync, i.e. latency sensitive, but writes are async, 
i.e. latency insensitive. You should use O_DIRECT dd writes for the fair 
comparison.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Performance of SCST versus STGT

2008-01-24 Thread Vladislav Bolkhovitin

Robin Humble wrote:

On Thu, Jan 24, 2008 at 11:36:45AM +0100, Bart Van Assche wrote:


On Jan 24, 2008 8:06 AM, Robin Humble [EMAIL PROTECTED] wrote:


On Tue, Jan 22, 2008 at 01:32:08PM +0100, Bart Van Assche wrote:


.
.   .   STGT read SCST read.STGT read  
SCST read.
.   .  performance   performance   .   performance
performance   .
.   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB MB/s)   
(1 MB, MB/s)  .
.
. Ethernet (1 Gb/s network) .  77 78   . 77 
   89   .
. IPoIB(8 Gb/s network) . 163185   .201 
  239   .
. iSER (8 Gb/s network) . 250N/A   .360 
  N/A   .
. SRP  (8 Gb/s network) . N/A421   .N/A 
  683   .



how are write speeds with SCST SRP?
for some kernels and tests tgt writes at 2x the read speed.

also I see much higher speeds that what you report in my DDR 4x IB tgt
testing... which could be taken as inferring that tgt is scaling quite
nicely on the faster fabric?
 ib_write_bw of 1473 MB/s
 ib_read_bw  of 1378 MB/s

iSER to 7G ramfs, x86_64, centos4.6, 2.6.22 kernels, git tgtd,
initiator end booted with mem=512M, target with 8G ram

direct i/o dd
 write/read  800/751 MB/s
   dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
   dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct

buffered i/o dd
 write/read 1109/350 MB/s
   dd if=/dev/zero of=/dev/sdc bs=1M count=5000
   dd of=/dev/null if=/dev/sdc bs=1M count=5000

buffered i/o lmdd
write/read  682/438 MB/s
  lmdd if=internal of=/dev/sdc bs=1M count=5000
  lmdd of=internal if=/dev/sdc bs=1M count=5000




The tests I performed were read performance tests with dd and with
buffered I/O. For this test you obtained 350 MB/s with STGT on a DDR



... and 1.1GB/s writes :)
presumably because buffer aggregation works well.



4x InfiniBand network, while I obtained 360 MB/s on a SDR 4x
InfiniBand network. I don't think that we can call this scaling up
...



the direct i/o read speed being twice the buffered i/o speed would seem
to imply that Linux's page cache is being slow and confused with this
particular set of kernel + OS + OFED versions.
I doubt that this result actually says that much about tgt really.


Buffered dd read is, actually, one of the best benchmarks if you want to 
compare STGT vs SCST, because it's single threaded with one outstanding 
command most of the time, i.e. it's a latency bound workload. Plus, most 
of the applications reading files do exactly what dd does.


Both SCST and STGT suffer equally from possible problems on the 
initiator, but SCST bears it much better, because it has much less 
processing latency (e.g., because there are no extra user-kernel 
spaces switches and other related overhead).



Regarding write performance: the write tests were performed with a
real target (three disks in RAID-0, write bandwidth about 100 MB/s). I



I'd be interested to see ramdisk writes.

cheers,
robin
___
Stgt-devel mailing list
[EMAIL PROTECTED]
https://lists.berlios.de/mailman/listinfo/stgt-devel



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-24 Thread Vladislav Bolkhovitin

Robin Humble wrote:

On Thu, Jan 24, 2008 at 02:10:06PM +0300, Vladislav Bolkhovitin wrote:


On Jan 24, 2008 8:06 AM, Robin Humble [EMAIL PROTECTED] wrote:


how are write speeds with SCST SRP?
for some kernels and tests tgt writes at 2x the read speed.


There is a fundamental difference between regular dd-like reads and writes: 
reads are sync, i.e. latency sensitive, but writes are async, i.e. latency 
insensitive. You should use O_DIRECT dd writes for the fair comparison.


I agree, although the vast majority of applications don't use O_DIRECT.


Sorry, it isn't about O_DIRECT usage. It's about latency bound or not 
workload.



anwyay, the direct i/o results were in the email:

  direct i/o dd
   write/read  800/751 MB/s
 dd if=/dev/zero of=/dev/sdc bs=1M count=5000 oflag=direct
 dd of=/dev/null if=/dev/sdc bs=1M count=5000 iflag=direct

I couldn't find a direct i/o option for lmdd.

cheers,
robin
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-24 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 24, 2008 8:06 AM, Robin Humble [EMAIL PROTECTED] wrote:


On Tue, Jan 22, 2008 at 01:32:08PM +0100, Bart Van Assche wrote:


.
.   .   STGT read SCST read.STGT read  
SCST read.
.   .  performance   performance   .   performance
performance   .
.   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB MB/s)   
(1 MB, MB/s)  .
.
. Ethernet (1 Gb/s network) .  77 78   . 77 
   89   .
. IPoIB(8 Gb/s network) . 163185   .201 
  239   .
. iSER (8 Gb/s network) . 250N/A   .360 
  N/A   .
. SRP  (8 Gb/s network) . N/A421   .N/A 
  683   .





Results with /dev/ram0 configured as backing store on the target (buffered I/O):
Read  Write Read  Write
  performance   performance   performance   performance
  (0.5K, MB/s)  (0.5K, MB/s)  (1 MB, MB/s)  (1 MB, MB/s)
STGT + iSER   250  48 349  781
SCST + SRP411  66 659  746


Ib_rdma_bw now reports 933 MB/s on the same system, correct? Those 
~250MB/s difference is what you will gain with zero-copy IO implemented 
and what STGT with the current architecture has no chance to achieve.



Results with /dev/ram0 configured as backing store on the target (direct I/O):
Read  Write Read  Write
  performance   performance   performance   performance
  (0.5K, MB/s)  (0.5K, MB/s)  (1 MB, MB/s)  (1 MB, MB/s)
STGT + iSER 7.9 9.8   589  647
SCST + SRP 12.3 9.7   811  794

Bart.



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Integration of SCST in the mainstream Linux kernel

2008-01-23 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

As you probably know there is a trend in enterprise computing towards
networked storage. This is illustrated by the emergence during the
past few years of standards like SRP (SCSI RDMA Protocol), iSCSI
(Internet SCSI) and iSER (iSCSI Extensions for RDMA). Two different
pieces of software are necessary to make networked storage possible:
initiator software and target software. As far as I know there exist
three different SCSI target implementations for Linux:
- The iSCSI Enterprise Target Daemon (IETD,
http://iscsitarget.sourceforge.net/);
- The Linux SCSI Target Framework (STGT, http://stgt.berlios.de/);
- The Generic SCSI Target Middle Level for Linux project (SCST,
http://scst.sourceforge.net/).
Since I was wondering which SCSI target software would be best suited
for an InfiniBand network, I started evaluating the STGT and SCST SCSI
target implementations. Apparently the performance difference between
STGT and SCST is small on 100 Mbit/s and 1 Gbit/s Ethernet networks,
but the SCST target software outperforms the STGT software on an
InfiniBand network. See also the following thread for the details:
http://sourceforge.net/mailarchive/forum.php?thread_name=e2e108260801170127w2937b2afg9bef324efa945e43%40mail.gmail.comforum_name=scst-devel.

About the design of the SCST software: while one of the goals of the
STGT project was to keep the in-kernel code minimal, the SCST project
implements the whole SCSI target in kernel space. SCST is implemented
as a set of new kernel modules, only minimal changes to the existing
kernel are necessary before the SCST kernel modules can be used. This
is the same approach that will be followed in the very near future in
the OpenSolaris kernel (see also
http://opensolaris.org/os/project/comstar/). More information about
the design of SCST can be found here:
http://scst.sourceforge.net/doc/scst_pg.html.

My impression is that both the STGT and SCST projects are well
designed, well maintained and have a considerable user base. According
to the SCST maintainer (Vladislav Bolkhovitin), SCST is superior to
STGT with respect to features, performance, maturity, stability, and
number of existing target drivers. Unfortunately the SCST kernel code
lives outside the kernel tree, which makes SCST harder to use than
STGT.

As an SCST user, I would like to see the SCST kernel code integrated
in the mainstream kernel because of its excellent performance on an
InfiniBand network. Since the SCST project comprises about 14 KLOC,
reviewing the SCST code will take considerable time. Who will do this
reviewing work ? And with regard to the comments made by the
reviewers: Vladislav, do you have the time to carry out the
modifications requested by the reviewers ? I expect a.o. that
reviewers will ask to move SCST's configuration pseudofiles from
procfs to sysfs.


Sure, I do, although I personally don't see much sense in such move.


Bart Van Assche.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-22 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

The big problem of stgt iSER is disk I/Os (move data between disk and
page cache). We need a proper asynchronous I/O mechanism, however,
Linux doesn't provide such and we use a workaround, which incurs large
latency. I guess, we cannot solve this until syslets is merged into
mainline.


Hmm, SCST also doesn't have ability to use asynchronous I/O, but that 
doesn't prevent it from showing good performance.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-22 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 17, 2008 6:45 PM, Pete Wyckoff [EMAIL PROTECTED] wrote:


There's nothing particularly stunning here.  Suspect Bart has
configuration issues if not even IPoIB will do  100 MB/s.



By this time I found out that the BIOS of the test systems (Intel
Server Board S5000PAL) set the PCI-e parameter MaxReadReq to 128
bytes, which explains the low InfiniBand performance. After changing
this parameter to 4096 bytes the InfiniBand throughput was as
expected: ib_rdma_bw now reports a
bandwidth of 933 MB/s.


What are the new SRPT/iSER numbers?


Bart.



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-22 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Tue, 22 Jan 2008 14:33:13 +0300
Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:



FUJITA Tomonori wrote:


The big problem of stgt iSER is disk I/Os (move data between disk and
page cache). We need a proper asynchronous I/O mechanism, however,
Linux doesn't provide such and we use a workaround, which incurs large
latency. I guess, we cannot solve this until syslets is merged into
mainline.


Hmm, SCST also doesn't have ability to use asynchronous I/O, but that 
doesn't prevent it from showing good performance.



I don't know how SCST performs I/Os, but surely, in kernel space, you
can performs I/Os asynchronously.


Sure, but currently it all synchronous


Or you use an event notification
mechanism with multiple kernel threads performing I/Os synchronously.

Xen blktap has the same problem as stgt. IIRC, Xen mainline uses a
kernel patch to add a proper event notification to AIO though redhat
uses the same workaround as stgt instead of applying the kernel patch.
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-22 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:
On Jan 22, 2008 12:33 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] 
mailto:[EMAIL PROTECTED] wrote:


 What are the new SRPT/iSER numbers?


You can find the new performance numbers below. These are all numbers 
for reading from the remote buffer cache, no actual disk reads were 
performed. The read tests have been performed with dd, both for a block 
size of 512 bytes and of 1 MB. The tests with small block size learn 
more about latency, while the tests with large block size learn more 
about the maximal possible throughput.


If you want to compare performance of 512b vs 1MB blocks, your 
experiment isn't fully correct. You should use iflag=direct dd option 
for that.


. 

.   .   STGT read SCST read.STGT 
read  SCST read.
.   .  performance   performance   .   
performanceperformance   .
.   .  (0.5K, MB/s)  (0.5K, MB/s)  .   (1 MB, 
MB/s)   (1 MB, MB/s)  .

.
. Ethernet (1 Gb/s network) .  77 78   .
77 89   .
. IPoIB(8 Gb/s network) . 163185   .   
201239   .
. iSER (8 Gb/s network) . 250N/A   .   
360N/A   .
. SRP  (8 Gb/s network) . N/A421   .   
N/A683   .

.

My conclusion from the above numbers: the performance difference between 
STGT and SCST is small for a Gigabit Ethernet network. The faster the 
network technology, the larger the difference between SCST and STGT.


This is what I expected


Bart.


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-21 Thread Vladislav Bolkhovitin

Bart Van Assche wrote:

On Jan 18, 2008 1:08 PM, Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:


[ ... ]
So, seems I understood your slides correctly: the more valuable data for
our SCST SRP vs STGT iSER comparison should be on page 26 for 1 command
read (~480MB/s, i.e. ~60% from Bart's result on the equivalent hardware).



At least in my tests SCST performed significantly better than STGT.
These tests were performed with the currently available
implementations of SCST and STGT. Which performance improvements are
possible for these projects (e.g. zero-copying), and by how much is it
expected that these performance improvements will increase throughput
and will decrease latency ?


Sure, zero-copying cache support is well possible for SCST and hopefully 
will be available soon. The performance (throughput) improvement will 
depend from used hardware and data access pattern, but the upper bound 
estimation can be made knowing memory copy throughput on your system 
(1.6GB/s according to your measurements). For 10Gbps link with 0.9GB/s 
wire speed it should be up to 30%, for 20Gbps link with wire speed 
1.5GB/s (PCI-E 8x limitation) - something up to 70-80%.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-18 Thread Vladislav Bolkhovitin

Pete Wyckoff wrote:

I have performed a test to compare the performance of SCST and STGT.
Apparently the SCST target implementation performed far better than
the STGT target implementation. This makes me wonder whether this is
due to the design of SCST or whether STGT's performance can be
improved to the level of SCST ?

Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
cache -- no disk reads were performed, all reads were from the cache).
Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000

STGT read SCST read
 performance (MB/s)   performance (MB/s)
Ethernet (1 Gb/s network)7789
IPoIB (8 Gb/s network)   82   229
SRP (8 Gb/s network)N/A   600
iSER (8 Gb/s network)80   N/A

These results show that SCST uses the InfiniBand network very well
(effectivity of about 88% via SRP), but that the current STGT version
is unable to transfer data faster than 82 MB/s. Does this mean that
there is a severe bottleneck  present in the current STGT
implementation ?



I don't know about the details but Pete said that he can achieve more
than 900MB/s read performance with tgt iSER target using ramdisk.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html


Please don't confuse multithreaded latency insensitive workload with 
single threaded, hence latency sensitive one.


Seems that he can get good performance with single threaded workload:

http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf

But I don't know about the details so let's wait for Pete to comment
on this.


Page 16 is pretty straight forward.  One command outstanding from
the client.  It is an OSD read command.  Data on tmpfs. 


Hmm, I wouldn't say it's pretty straight forward. It has data for 
InfiniBand and it's unclear if it's using iSER or some IB performance 
test tool. I would rather interpret those data as for IB, not iSER.



500 MB/s is
pretty easy to get on IB.

The other graph on page 23 is for block commands.  600 MB/s ish.
Still single command; so essentially a latency test.  Dominated by
the memcpy time from tmpfs to pinned IB buffer, as per page 24.

Erez said:



We didn't run any real performance test with tgt, so I don't have
numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all
data was read/written to the same block (so it was all done in the
cache). Pete - am I right?


Yes (actually just 1 thread in sg_dd).  This is obviously cheating.
Take the pread time to zero in SCSI Read analysis on page 24 to show
max theoretical.  It's IB theoretical minus some initiator and stgt
overheads.


Yes, that's obviously cheating and its result can't be compared with 
what Bart had. Full data footprint on target fit in the CPU cache, so 
you had rather results for NULLIO (SCST term).


So, seems I understood your slides correctly: the more valuable data for 
our SCST SRP vs STGT iSER comparison should be on page 26 for 1 command 
read (~480MB/s, i.e. ~60% from Bart's result on the equivalent hardware).



The other way to get more read throughput is to throw multiple
simultaneous commands at the server.

There's nothing particularly stunning here.  Suspect Bart has
configuration issues if not even IPoIB will do  100 MB/s.

-- Pete




-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Thu, 17 Jan 2008 10:27:08 +0100
Bart Van Assche [EMAIL PROTECTED] wrote:



Hello,

I have performed a test to compare the performance of SCST and STGT.
Apparently the SCST target implementation performed far better than
the STGT target implementation. This makes me wonder whether this is
due to the design of SCST or whether STGT's performance can be
improved to the level of SCST ?

Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
cache -- no disk reads were performed, all reads were from the cache).
Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000

 STGT read SCST read
  performance (MB/s)   performance (MB/s)
Ethernet (1 Gb/s network)7789
IPoIB (8 Gb/s network)   82   229
SRP (8 Gb/s network)N/A   600
iSER (8 Gb/s network)80   N/A

These results show that SCST uses the InfiniBand network very well
(effectivity of about 88% via SRP), but that the current STGT version
is unable to transfer data faster than 82 MB/s. Does this mean that
there is a severe bottleneck  present in the current STGT
implementation ?



I don't know about the details but Pete said that he can achieve more
than 900MB/s read performance with tgt iSER target using ramdisk.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html


Please don't confuse multithreaded latency insensitive workload with 
single threaded, hence latency sensitive one.



To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Thu, 17 Jan 2008 12:48:28 +0300
Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:



FUJITA Tomonori wrote:


On Thu, 17 Jan 2008 10:27:08 +0100
Bart Van Assche [EMAIL PROTECTED] wrote:




Hello,

I have performed a test to compare the performance of SCST and STGT.
Apparently the SCST target implementation performed far better than
the STGT target implementation. This makes me wonder whether this is
due to the design of SCST or whether STGT's performance can be
improved to the level of SCST ?

Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
cache -- no disk reads were performed, all reads were from the cache).
Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000

STGT read SCST read
 performance (MB/s)   performance (MB/s)
Ethernet (1 Gb/s network)7789
IPoIB (8 Gb/s network)   82   229
SRP (8 Gb/s network)N/A   600
iSER (8 Gb/s network)80   N/A

These results show that SCST uses the InfiniBand network very well
(effectivity of about 88% via SRP), but that the current STGT version
is unable to transfer data faster than 82 MB/s. Does this mean that
there is a severe bottleneck  present in the current STGT
implementation ?



I don't know about the details but Pete said that he can achieve more
than 900MB/s read performance with tgt iSER target using ramdisk.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html


Please don't confuse multithreaded latency insensitive workload with 
single threaded, hence latency sensitive one.



Seems that he can get good performance with single threaded workload:

http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf


Hmm, I can't find which IB hardware did he use and it's declared Gbps 
speed. He declared only Mellanox 4X SDR, switch. What does it mean?



But I don't know about the details so let's wait for Pete to comment
on this.


I added him on CC


Perhaps Voltaire people could comment on the tgt iSER performances.



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] [Stgt-devel] Performance of SCST versus STGT

2008-01-17 Thread Vladislav Bolkhovitin

Robin Humble wrote:

On Thu, Jan 17, 2008 at 01:34:46PM +0300, Vladislav Bolkhovitin wrote:

Hmm, I can't find which IB hardware did he use and it's declared Gbps 
speed. He declared only Mellanox 4X SDR, switch. What does it mean?



SDR is 10Gbit carrier, at most about  ~900MB/s data rate.
DDR is 20Gbit carrier, at most about ~1400MB/s data rate.


Thanks. Then the single threaded rate with one outstanding command 
between SCST SRP on 8Gbps link vs STGT iSRP on 10Gbps link (according to 
that paper) is 600MB/s vs ~480MB/s (page 26). Still SCST based target is 
about 60% faster.



On Thu, 17 Jan 2008 10:27:08 +0100 Bart Van Assche [EMAIL PROTECTED] wrote:

Test performed: read 2 GB of data in blocks of 1 MB from a target (hot   
cache -- no disk reads were performed, all reads were from the cache).   
Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000  

 STGT read SCST read
  performance (MB/s)   performance (MB/s)   
Ethernet (1 Gb/s network)7789
IPoIB (8 Gb/s network)   82   229
SRP (8 Gb/s network)N/A   600
iSER (8 Gb/s network)80   N/A



it kinda looks to me like the tgt iSER tests were waaay too slow to be
using RDMA :-/
I use tgt to get 500MB/s writes over iSER DDR IB to real files (not
ramdisk). Reads are a little slower, but that changes a bit with distro
vs. mainline kernels.

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Performance of SCST versus STGT

2008-01-17 Thread Vladislav Bolkhovitin

Erez Zilber wrote:

FUJITA Tomonori wrote:


On Thu, 17 Jan 2008 12:48:28 +0300
Vladislav Bolkhovitin [EMAIL PROTECTED] wrote:

 


FUJITA Tomonori wrote:
   


On Thu, 17 Jan 2008 10:27:08 +0100
Bart Van Assche [EMAIL PROTECTED] wrote:


 


Hello,

I have performed a test to compare the performance of SCST and STGT.
Apparently the SCST target implementation performed far better than
the STGT target implementation. This makes me wonder whether this is
due to the design of SCST or whether STGT's performance can be
improved to the level of SCST ?

Test performed: read 2 GB of data in blocks of 1 MB from a target (hot
cache -- no disk reads were performed, all reads were from the cache).
Test command: time dd if=/dev/sde of=/dev/null bs=1M count=2000

STGT read SCST read
 performance (MB/s)   performance (MB/s)
Ethernet (1 Gb/s network)7789
IPoIB (8 Gb/s network)   82   229
SRP (8 Gb/s network)N/A   600
iSER (8 Gb/s network)80   N/A

These results show that SCST uses the InfiniBand network very well
(effectivity of about 88% via SRP), but that the current STGT version
is unable to transfer data faster than 82 MB/s. Does this mean that
there is a severe bottleneck  present in the current STGT
implementation ?
   


I don't know about the details but Pete said that he can achieve more
than 900MB/s read performance with tgt iSER target using ramdisk.

http://www.mail-archive.com/[EMAIL PROTECTED]/msg4.html
 


Please don't confuse multithreaded latency insensitive workload with 
single threaded, hence latency sensitive one.
   


Seems that he can get good performance with single threaded workload:

http://www.osc.edu/~pw/papers/wyckoff-iser-snapi07-talk.pdf


But I don't know about the details so let's wait for Pete to comment
on this.

Perhaps Voltaire people could comment on the tgt iSER performances.


We didn't run any real performance test with tgt, so I don't have
numbers yet. I know that Pete got ~900 MB/sec by hacking sgp_dd, so all
data was read/written to the same block (so it was all done in the
cache). Pete - am I right?

As already mentioned, he got that with IB SDR cards that are 10 Gb/sec
cards in theory (actual speed is ~900 MB/sec). With DDR cards (20
Gb/sec), you can get even more. I plan to test that in the near future.


Are you writing about a maximum possible speed which he got, including 
multithreded tests with many outstanding commands or about speed he got 
 on single threaded reads with one outstanding command? This thread is 
about the second one.



Erez
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Open-FCoE on linux-scsi

2008-01-05 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

What's the general opinion on this? Duplicate code vs. more kernel code?
I can see that you're already starting to clean up the code that you
ported. Does that mean the duplicate code isn't an issue to you? When we
fix bugs in the initiator they're not going to make it into your tree
unless you're diligent about watching the list.


It's hard to convince the kernel maintainers to merge something into
mainline that which can be implemented in user space. I failed twice
(with two iSCSI target implementations).


Tomonori and the kernel maintainers,

In fact, almost all of the kernel can be done in user space, including 
all the drivers, networking, I/O management with block/SCSI initiator 
subsystem and disk cache manager. But does it mean that currently kernel 
is bad and all the above should be (re)done in user space instead? I 
think, not. Linux isn't a microkernel for very pragmatic reasons: 
simplicity and performance.


1. Simplicity.

For SCSI target, especially with hardware target card, data are come 
from kernel and eventually served by kernel doing actual I/O or 
getting/putting data from/to cache. Dividing the requests processing job 
between user and kernel space creates unnecessary interface layer(s) and 
effectively makes the requests processing job distributed with all its 
complexity and reliability problems. As the example, what will currently 
happen in STGT if the user space part suddenly dies? Will the kernel 
part gracefully recover from it? How much effort will be needed to 
implement that?


Another example is the mentioned above code duplication. Is it good? 
What will it bring? Or you care only about amount of the kernel's code 
and don't care about the overall amount of code? If so, you should 
(re)read what Linus Torvalds thinks about that: 
http://lkml.org/lkml/2007/4/24/364 (I don't consider myself as an 
authoritative in this question)


I agree that some of the processing, which can be clearly separated, can 
and should be done in user space. The good example of such approach is 
connection negotiation and management in the way, how it's done in 
open-iscsi. But I don't agree that this idea should be driven to the 
absolute. It might look good, but it's unpractical, it will only make 
things more complicated and harder for maintainership.


2. Performance.

Modern SCSI transports, e.g. Infiniband, have as low link latency as 
1(!) microsecond. For comparison, the inter-thread context switch time 
on a modern system is about the same, syscall time - about 0.1 
microsecond. So, only ten empty syscalls or one context switch add the 
same latency as the link. Even 1Gbps Ethernet has less, than 100 
microseconds of round-trip latency.


You, most likely, know, that QLogic target driver for SCST allows 
commands being executed either directly from soft IRQ, or from the 
corresponding thread. There is a steady 5% difference in IOPS between 
those modes on 512 bytes reads on nullio using 4Gbps link. So, a single 
additional inter-kernel-thread context switch costs 5% of IOPS.


Another source of additional unavoidable with the user space approach 
latency is data copy to/from cache. With the fully kernel space 
approach, cache can be used directly, so no extra copy will be needed.


So, putting code in the user space you should accept the extra latency 
it adds. Many, if not most, real-life workloads more or less latency, 
not throughput, bound, so you shouldn't be surprised that single stream 
dd if=/dev/sdX of=/dev/null on initiator gives too low values. Such 
benchmark isn't less important and practical, than all the 
multithreaded latency insensitive benchmarks, which people like running.


You may object me that the backstorage's latency is a lot more, than 1 
microsecond, but that is true only if data are read/written from/to the 
actual backstorage media, not from the cache, even from the backstorage 
device's cache. Nothing prevents a target from having 8 or even 64GB of 
cache, so most even random accesses could be served by it. This is 
especially important for sync. writes.


Thus, I believe, that partial user space, partial kernel space approach 
for building SCSI targets is the move in the wrong direction, because it 
brings practically nothing, but costs a lot.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

2007-11-21 Thread Vladislav Bolkhovitin

James Bottomley wrote:

if you specifically set TAS=1 you're giving up the right to know what
caused the command termination.  With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error.  If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.


Actually, having TAS=1 has a considerably advantage over TAS=0 from 
error recovery point of view. With TAS=1 all aborted commands are 
supposed to be returned immediately to all affected initiators. With 
TAS=0 affected initiators will not receive any notification about 
aborted commands, only COMMANDS CLEARED BY ANOTHER INITIATOR UA will be 
established. So, they will know about that only after there will be 
timeout for their commands.


Thus, with TAS=1 almost immediate error recovery is possible, but with 
TAS=0 error recovery is possible after timeout, which for SSC devices 
can be hours.


But having TAS=1 is legal, right? So it should be handled well. If 
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
ABORTED status can only be returned with TAS=1.


Driving with your handbrake on is legal too ... that doesn't mean you
should do it ... and it certainly doesn't give you a legitimate
complaint against the manufacturer of your car for excessive brake pad
wear.

We handle TASK ABORTED as well as we can (by failing it).  For better
handling set TAS=0 and we'll handle the individual cases according to
the sense codes.


After some digging in SAM/SPC I've figured out that TASK ABORTED status 
can be returned exactly in the same circumstances as COMMANDS CLEARED BY 
ANOTHER INITIATOR UA, it only depends from TAS bit, which way of the 
notification is used. So, TASK ABORTED status carries the same 
information as COMMANDS CLEARED BY ANOTHER INITIATOR UA and should be 
handled at the same way. I.e., if for COMMANDS CLEARED BY ANOTHER 
INITIATOR UA the affected commands are restarted, they should be 
restarted for TASK ABORTED status as well.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

2007-11-20 Thread Vladislav Bolkhovitin

James Bottomley wrote:

On Tue, 2007-11-20 at 19:15 +0300, Vladislav Bolkhovitin wrote:


James Bottomley wrote:


I'm not sure your conclusions necessarily follow your data.  What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?


It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
when it working with several initiators with different transports over 
the same set of devices, each of them having with TAS bit in the control 
mode page set. According to SAM, in this case TASK ABORTED status can be 
returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
just should be retried. But QUEUE FULL status handled well, but TASK 
ABORTED leads to filesystem corruption.


So this is with a soft target implementation ... so it could be an
ordering issue inside the target that's causing the filesystem
corruption on error.


Target offers no ordering guarantees for SIMPLE commands and frankly 
says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
control mode page. As we know, initiator doesn't use ORDERED tags (and 
it really doesn't use them according to the logs), so if it's an 
ordering issue, it's at the initiator's side.



if you specifically set TAS=1 you're giving up the right to know what
caused the command termination.  With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error.  If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.


But having TAS=1 is legal, right? So it should be handled well. If 
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
ABORTED status can only be returned with TAS=1.



One of my test suites has an initiator which randomly spits errors.
I've yet to see it cause an error that an ext3 journal can't recover
from.  So, if there's a genuine problem we need a nice test case to pass
to the filesystem people.


If you need a clear testcase (IMHO, in this case it isn't needed, 
because it's clear without it), I can prepare a patch for SCST to 
randomly return TASK ABORTED status.


You can get the latest version of SCST and the target drivers using SVN:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst


James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

2007-11-20 Thread Vladislav Bolkhovitin

James Bottomley wrote:

I'm not sure your conclusions necessarily follow your data.  What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?


It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
when it working with several initiators with different transports over 
the same set of devices, each of them having with TAS bit in the control 
mode page set. According to SAM, in this case TASK ABORTED status can be 
returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
just should be retried. But QUEUE FULL status handled well, but TASK 
ABORTED leads to filesystem corruption.


So this is with a soft target implementation ... so it could be an
ordering issue inside the target that's causing the filesystem
corruption on error.


Target offers no ordering guarantees for SIMPLE commands and frankly 
says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
control mode page. As we know, initiator doesn't use ORDERED tags (and 
it really doesn't use them according to the logs), so if it's an 
ordering issue, it's at the initiator's side.




if you specifically set TAS=1 you're giving up the right to know what
caused the command termination.  With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error.  If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.


But having TAS=1 is legal, right? So it should be handled well. If 
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
ABORTED status can only be returned with TAS=1.


Driving with your handbrake on is legal too ... that doesn't mean you
should do it ... and it certainly doesn't give you a legitimate
complaint against the manufacturer of your car for excessive brake pad
wear.

We handle TASK ABORTED as well as we can (by failing it).  For better
handling set TAS=0 and we'll handle the individual cases according to
the sense codes.


So, should I consider your words as you think that it's perfectly fine 
to corrupt file system for devices with TAS=1? Absolutely legal devices, 
repeat. Hence, in your opinion, no further investigation should be done?



One of my test suites has an initiator which randomly spits errors.
I've yet to see it cause an error that an ext3 journal can't recover
from.  So, if there's a genuine problem we need a nice test case to pass
to the filesystem people.


If you need a clear testcase (IMHO, in this case it isn't needed, 
because it's clear without it), I can prepare a patch for SCST to 
randomly return TASK ABORTED status.


You can get the latest version of SCST and the target drivers using SVN:

$ svn co https://scst.svn.sourceforge.net/svnroot/scst


There's no real need to bother with setting all this up ... a simple
initiator modification randomly to return TASK ABORTED should suffice.


Yes, you're right. Then, I suppose, Mike Christie should be the best 
person to do it?


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bugme-new] [Bug 9405] New: iSCSI does not implement ordering guarantees required by e.g. journaling filesystems

2007-11-20 Thread Vladislav Bolkhovitin

James Bottomley wrote:

I'm not sure your conclusions necessarily follow your data.  What was
the reason for the TASK ABORTED (I'd guess QErr settings, right)?


It was my desire/curiosity during tests of SCST (http://scst.sf.net), 
when it working with several initiators with different transports over 
the same set of devices, each of them having with TAS bit in the control 
mode page set. According to SAM, in this case TASK ABORTED status can be 
returned at any time, similarly to QUEUE FULL, i.e. IMHO such command 
just should be retried. But QUEUE FULL status handled well, but TASK 
ABORTED leads to filesystem corruption.


So this is with a soft target implementation ... so it could be an
ordering issue inside the target that's causing the filesystem
corruption on error.


Target offers no ordering guarantees for SIMPLE commands and frankly 
says that to initiator via QUEUE ALGORITHM MODIFIER value 1 in the 
control mode page. As we know, initiator doesn't use ORDERED tags (and 
it really doesn't use them according to the logs), so if it's an 
ordering issue, it's at the initiator's side.



if you specifically set TAS=1 you're giving up the right to know what
caused the command termination.  With insufficient information, it's
really unsafe to simply retry, which is why the mid layer just returns
TASK ABORTED as an error.  If you set TAS=0 we'll get a check
condition/unit attention explaining what happened (usually commands
cleared by another initiator) and we'll explicitly do the right thing
based on the sense data.


But having TAS=1 is legal, right? So it should be handled well. If 
TAS=0, TASK ABORTED can't be returned, it would be illegal. So, TASK 
ABORTED status can only be returned with TAS=1.


Driving with your handbrake on is legal too ... that doesn't mean you
should do it ... and it certainly doesn't give you a legitimate
complaint against the manufacturer of your car for excessive brake pad
wear.

We handle TASK ABORTED as well as we can (by failing it).  For better
handling set TAS=0 and we'll handle the individual cases according to
the sense codes.


So, should I consider your words as you think that it's perfectly fine 
to corrupt file system for devices with TAS=1? Absolutely legal devices, 
repeat. Hence, in your opinion, no further investigation should be done?


Logic wouldn't support such a conclusion.


Sorry, lately I've got too many I won't bother, this is your problem 
style answers



You have intertwined two issues

 1. How should the mid layer handle TASK ABORTED.  I think we've
reached the point where returning I/O error is the best we can
do, but if TAS=0 we could have used the sense data to do better.
 2. Should a request I/O error cause corruption in ext3 that can't
be recovered by a journal replay. I think the answer here is
no, so there needs to be an easily reproducible test case to
pass to the filesystem people.


OK, I see you point. As I already wrote, I can assist only in testing here.


James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Target mode support for qlogic chipsets isp2422/2432/5422/5432

2007-10-23 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

On Tue, 23 Oct 2007 13:47:20 +0530
Thayumanavar Sachithanantham [EMAIL PROTECTED] wrote:



Hi All,

Does the recent target mode support added for tgt support target mode
for qla chipset (qla24xx series)?



We've been trying:

http://marc.info/?t=11885798674r=1w=2

But I heard that the qla24xx firmware doesn't support target mode (I
use QLA2340).


Standard QLogic QLA24xx firmware supports target mode.

Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx behavior with changing volumes

2007-09-21 Thread Vladislav Bolkhovitin

Sean Bruno wrote:

What is the expected behavior when volumes on a SAN change size and LUN
ID order?

I've noticed that if a volume changes size, leaves the SAN or changes
target ID it isn't auto-magically picked up by a 2.6.18 based
system(running CentOS 5).

If a new target appears on the SAN however, it is noticed and assigned a
new drive letter.


For changes in the volume size the target (SAN) should generate 
CAPACITY DATA HAS CHANGED Unit Attention.


For changes in the LUN ID order the target should generate REPORTED 
LUNS DATA HAS CHANGED Unit Attention.


On these notifications initiator is supposed to make the appropriate 
actions, like rescan the SAN in case of REPORTED LUNS DATA HAS 
CHANGED. Unfortunately, Linux just ignores them as well as the majority 
of other Unit Attentions, hence you have to restart the system or, at 
least, the corresponding driver to see the changes.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qla2xxx behavior with changing volumes

2007-09-21 Thread Vladislav Bolkhovitin

Vladislav Bolkhovitin wrote:

Sean Bruno wrote:


What is the expected behavior when volumes on a SAN change size and LUN
ID order?

I've noticed that if a volume changes size, leaves the SAN or changes
target ID it isn't auto-magically picked up by a 2.6.18 based
system(running CentOS 5).

If a new target appears on the SAN however, it is noticed and assigned a
new drive letter.



For changes in the volume size the target (SAN) should generate 
CAPACITY DATA HAS CHANGED Unit Attention.


For changes in the LUN ID order the target should generate REPORTED 
LUNS DATA HAS CHANGED Unit Attention.


On these notifications initiator is supposed to make the appropriate 
actions, like rescan the SAN in case of REPORTED LUNS DATA HAS 
CHANGED. Unfortunately, Linux just ignores them as well as the majority 
of other Unit Attentions, hence you have to restart the system or, at 
least, the corresponding driver to see the changes.


Or, I forgot, do the manual rescan via sysfs rescan. Sometimes it 
helps too.



Vlad



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Question for pass-through target design

2007-06-01 Thread Vladislav Bolkhovitin

Vladislav Bolkhovitin wrote:

So, if you need in-kernel pass-through I would suggest you to look at
SCST project (http://scst.sf.net), which is currently stable and mature,
although also not fully finished yet. It was historically from the very
beginning designed for full feature in-kernel pass-through for not only
stateless SCSI devices, like disks, but also for stateful SCSI devices
(like SSC ones a.k.a. tapes), where the correct handling of all above is
essential. In additional to considerably better performance, the
complete in-kernel approach makes the code simpler, smaller and cleaner
as well as allows such things as zero-copy buffered file IO, i.e. when 
data are sent to remote initiators or received from them directly 
from/to the page cache (currently under development). For those who need 
implementing SCSI devices in the user space scst_user module is about to 
be added. Since the SCSI state machine is in kernel the interface 
provided by scst_user is very simple, it essentially consists from only 
a single IOCTL and allows to have overhead as low as a single syscall 
per SCSI command without any additional context switches. It is already 
implemented and works. For some legal reasons I can't at the moment 
publish it, but you can see its full description in the project's SVN 
docs (you can get them using command svn co 
https://svn.sourceforge.net/svnroot/scst/trunk/doc;).


Now I released scst_user module and it is available from the SCST SVN, 
so you can check how simply it allows to write SCSI devices, like a VTL, 
in the user space.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Question for pass-through target design

2007-05-25 Thread Vladislav Bolkhovitin

Robert Jennings wrote:

* Vladislav Bolkhovitin ([EMAIL PROTECTED]) wrote:


Robert Jennings wrote:


What I meant that is that the kernel tgt code (scsi_tgt*) receives
SCSI commands from one lld and send them to another lld instead of
sending them to user space.


Although the approach of passing SCSI commands from a target LLD to an
initiator one without any significant interventions from the target
software looks to be nice and simple, you should realize how limited,
unsafe and illegal it is, since it badly violates SCSI specs.


I think that 'implemented cleanly' means that one scsi_host is assigned
to only one initiator.


Vladislav listed a number of issues that are inherent in an implementation
that does not have a 1:1 relationship of initiators to targets.  The vscsi
architecture defines the 1:1 relationship; it's imposible to have more
than one initiator per target.


Just few small notes:

1. As I already wrote, complete 1:1 relationship isn't practically 
possible, because there is always a local access on the target (i.e. one 
more initiator) and you can't disable it on practice.


I was proposing a 1:1 relationship of initiator to target within the
target framework for in-kernel pass-through.  We would still have the
case that local access on the target is possible; an administrator with
privileges neccessary to create a target would have the responsibility
to not then access the device locally.  


This is no different than if I create my root file system on /dev/sda1,
I should not also 'dd' data to /dev/sda1 while the system is running.
It's a bad idea, but nothing stops me; however this is something that
only a root level user can do.  This would be the same, these targets in
pass-through have permissions by default that do not allow local access
by non-root users.


In principle, yes, but, as usually, on practice it's not so easy. In 
your file system example the device is accessed via the FS, which 
provides a shared mode, and everybody doesn't have any need to do 
anything directly with the device. But in case of non-disk devices they 
are always accessed directly, so to explain your limitation you will 
have to write it with HUGE letters everywhere. Once one SCST user 
cleared Unit Attention on his exported tape device using st driver and 
asked then me why it isn't delivered to his remote initiator.


2. 1:1 relationship is a serious limitation for usage cases like an SPI 
tape library serving backup for several servers on an FC net.


Restricting the relationship to 1:1 would be for pass-through devices
only, this would not necessarily dictate other target types which could
be used for such cases.


The tape library from my example is the pass-through device. You can't 
access a parallel SCSI (SPI) device on an Fibre Channel (FC) in any 
other mode, right?


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Problems with SCST and QLA 2432 FC Cards

2007-05-18 Thread Vladislav Bolkhovitin

sandip shete wrote:

Hi,

I am working with the SCST 0.9.4 version on linux-2.6.15 with the
linux-2.6-qla2xxx-target.patch patch applied.
I was using a QLA2312 card on this setup and things were just fine when
i used this system as a Target.

Now I have switched to a qla2432 card and even though i do enable
Target Mode (echo 1 
/sys/class/scsi_host/host../target_mode_enabled) on the corresponding
port, this port fails to work as a target, and none of the Fileio Luns
are exported to the initiator.
Also, on the initiator side the /sys/class/fc_remote_port//role
file should show as FC Target, which it used to with QLA 2312, but
with QLA 2432 the initiator side shows the role of the remote port as
FC Initiator

The initiator has 2312 cards and 2.6.15 kernel compiled on it.

Also note that, i have the corresponding ql2400_fw.bin firmware binary
at the right location and it gets loaded when i load the modules.
To check if the qla2432 card was working fine, i connected this to a
different 2312 based target system and had it work as a Initiator, this
worked fine and i could see all the luns exported on this box.

Now the only problem that i can think of in target mode is, maybe, scst
doesn't support the qla 24xx series.


Yes, that's correct. Unfortunately, 24xx+ series are not supported yet.


But i fail to see any part of the code pointing towards that.


You can see in the README for the driver that only 22xx and 23xx series 
are currently supported.



When i  did some debugging on the initiator side i see that the
qla2x00_get_port_database does return the status
of the remote port as FCT_INITIATOR, i couldn't actually figure out the
code wherein the target returns the response to these mbox_commands. I
was wondering if SCST plays a part here and sends a different response
when 24xx cards are used.


Unfortunately, 24xx+ cards have very different interface, so add support 
for them is almost the same as write another driver.



I saw some posts regarding the problems that people were facing with qla24xx
series. If this has been fixed in a different verison of Linux/SCST 
that what i am using, please let me know.


Thanks and Regards.
Sandip S

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Scst-devel] Problems with SCST and QLA 2432 FC Cards

2007-05-18 Thread Vladislav Bolkhovitin

sandip shete wrote:

Hi,

I wish to develop support for QLA 24xx series. If you already have a 
partial implementaion of the same, i would like to take it forward.
And if there isn't, i would appreciate if you could give me some 
pointers in that direction.


Most probably, the driver by link sent by Matthew Jacob will be a good 
starting point, where you can see examples how to work with the card, so 
you can add it to the qla2x00t driver. Also you will need a manual the 
firmware interface specification for 2400 series of the cards. It is 
under NDA, but you maybe lucky to get one from QLogic. Feel free to ask 
me any SCST or qla2x00t driver related questions.


I have adequate experience of programming in the SCSI domain, however i 
am not much conversant with the QLA driver code.


Thanks and Regards,
Sandip S

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Question for pass-through target design

2007-05-07 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

It looks like the pass-through target support is currently broken, at
least as I've checked for ibmvstgt, but I think it's a general problem.
I wanted to check my assumptions and get ideas.


Yeah, unfortunately, it works only with the iSCSI target driver (which
runs in user space).




The code isn't allocating any memory to pass along to the sg code to store
the result of a read or data for a write.  Currently, dxferp for sg_io_hdr
or dout_xferp/din_xferp for sg_io_v4 are assigned to the value of uaddr,
which is set to 0 in kern_queue_cmd.  With the pointer set to NULL,
the pass-through target isn't going to function.  Even if we had memory
allocated, there isn't a means of getting data to be written via sg down
this code path.

What ideas are there as to how the data will get to user-space so that
we can use sg?


For kernel-space drivers, we don't need to go to user-space. We can do
the pass-through in kernel space. I talked with James about this last
year and he said that if the code is implemented cleanly, he would
merges it into mainline.


We already have a pass-through in the kernel space for
kernel space drivers. It is the scsi_tgt* code.



Could you elaborate more?

What I meant that is that the kernel tgt code (scsi_tgt*) receives
SCSI commands from one lld and send them to another lld instead of
sending them to user space.


Although the approach of passing SCSI commands from a target LLD to an
initiator one without any significant interventions from the target
software looks to be nice and simple, you should realize how limited,
unsafe and illegal it is, since it badly violates SCSI specs.

Before I elaborate, let's have the following terminology in addition to
one described in SAM:

 - Target system - the overall system containing target and initiator
devices (and their LDDs). Target system exports one or more initiator
devices via the target device(s).

 - Target device - a SCSI device on the target system in the target mode.

 - Initiator device - a SCSI device on the target system in the
initiator mode. It actually serves commands that come from remote
initiators via target device(s).

 - Remote initiator - a SCSI initiator device connected to the target
device on the target system and uses (i.e. sends SCSI commands) exported
by it devices.

 - Target software - software that runs on the target system and
implements the necessary pass-through functionality

Let's consider a simplest case when a target system has one target
device, one initiator device and it exports the initiator device via the
target device as pass-through. The problem is that then the target
system creates a new SCSI target device, which is not the same as the
exported initiator device. Particularly, the new device could have 1
nexuses with remote initiators connected to it, while the initiator
device has no glue about them, it sees a single nexus with the target
system and only it.

And so? All the event notifications, which should be seen by all remote
initiators will be delivered to only one of them or not generated at
all, since some events are generated only for I_T nexuses other, than
one on which the command causing the event is received. The most common
example of such events is Unit Attentions. For example, after MODE
SELECT command, all remote initiators, except one, who sent the command,
shall receive MODE PARAMETERS CHANGED Unit Attention. Otherwise a bad
and quiet data corruption could happen.

More complicated example is SCSI reservations, doesn't matter persistent
or SPC-2 ones. Since the initiator device knows only about one nexus,
instead of actual many of them, the reservation commands should be
completely handled by target software on the target system. Having
delivery of Unit Attentions to all remote initiators especially
important for reservations, since they could mean that a reservation was
revoked by another initiator via, e.g., some task management function.

Things get even worse if we realize that (1) the initiator device could
report about its capabilities (like ACA support), which aren't supported
by the target software, hence misinform the remote initiators and again
could provoke a quiet data corruption, and (2) accesses to the initiator
devices from local programs on the target systems create another I_T
nexus, which needs to be handled as well.

(I suppose it is obvious that if the target system exports 1 initiator
devices via a single target device, since the initiator devices don't
know about each other, the target software in any case needs to
implement its own LUN addressing as well as own REPORT LUNS command
handler).

Thus, such in-kernel pass-through mode could be used only for limited
set of SCSI commands and SCSI device types with a big caution and
complete comprehension what's going on and how it should be. The latter
isn't true in the absolute majority of uses and users, so such approach
will give users a perfect weapon to shoot themselfs.

If you 

Re: [Stgt-devel] Question for pass-through target design

2007-05-07 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

From: Vladislav Bolkhovitin [EMAIL PROTECTED]
Subject: Re: [Stgt-devel] Question for pass-through target design
Date: Mon, 07 May 2007 18:24:44 +0400



FUJITA Tomonori wrote:


It looks like the pass-through target support is currently broken, at
least as I've checked for ibmvstgt, but I think it's a general problem.
I wanted to check my assumptions and get ideas.


Yeah, unfortunately, it works only with the iSCSI target driver (which
runs in user space).





The code isn't allocating any memory to pass along to the sg code to store
the result of a read or data for a write.  Currently, dxferp for sg_io_hdr
or dout_xferp/din_xferp for sg_io_v4 are assigned to the value of uaddr,
which is set to 0 in kern_queue_cmd.  With the pointer set to NULL,
the pass-through target isn't going to function.  Even if we had memory
allocated, there isn't a means of getting data to be written via sg down
this code path.

What ideas are there as to how the data will get to user-space so that
we can use sg?


For kernel-space drivers, we don't need to go to user-space. We can do
the pass-through in kernel space. I talked with James about this last
year and he said that if the code is implemented cleanly, he would
merges it into mainline.


We already have a pass-through in the kernel space for
kernel space drivers. It is the scsi_tgt* code.



Could you elaborate more?

What I meant that is that the kernel tgt code (scsi_tgt*) receives
SCSI commands from one lld and send them to another lld instead of
sending them to user space.


Although the approach of passing SCSI commands from a target LLD to an
initiator one without any significant interventions from the target
software looks to be nice and simple, you should realize how limited,
unsafe and illegal it is, since it badly violates SCSI specs.



I think that 'implemented cleanly' means that one scsi_host is assigned
to only one initiator.


Sorry, I don't fully understand you. If you mean you are going to limit 
only one remote initiator per-target device, then, well, is it even more 
limited (and limiting) or not?



___
Stgt-devel mailing list
[EMAIL PROTECTED]
https://lists.berlios.de/mailman/listinfo/stgt-devel



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Question for pass-through target design

2007-05-07 Thread Vladislav Bolkhovitin

FUJITA Tomonori wrote:

From: Vladislav Bolkhovitin [EMAIL PROTECTED]
Subject: Re: [Stgt-devel] Question for pass-through target design
Date: Mon, 07 May 2007 19:27:23 +0400



FUJITA Tomonori wrote:


From: Vladislav Bolkhovitin [EMAIL PROTECTED]
Subject: Re: [Stgt-devel] Question for pass-through target design
Date: Mon, 07 May 2007 18:24:44 +0400




FUJITA Tomonori wrote:



It looks like the pass-through target support is currently broken, at
least as I've checked for ibmvstgt, but I think it's a general problem.
I wanted to check my assumptions and get ideas.


Yeah, unfortunately, it works only with the iSCSI target driver (which
runs in user space).






The code isn't allocating any memory to pass along to the sg code to store
the result of a read or data for a write.  Currently, dxferp for sg_io_hdr
or dout_xferp/din_xferp for sg_io_v4 are assigned to the value of uaddr,
which is set to 0 in kern_queue_cmd.  With the pointer set to NULL,
the pass-through target isn't going to function.  Even if we had memory
allocated, there isn't a means of getting data to be written via sg down
this code path.

What ideas are there as to how the data will get to user-space so that
we can use sg?


For kernel-space drivers, we don't need to go to user-space. We can do
the pass-through in kernel space. I talked with James about this last
year and he said that if the code is implemented cleanly, he would
merges it into mainline.


We already have a pass-through in the kernel space for
kernel space drivers. It is the scsi_tgt* code.



Could you elaborate more?

What I meant that is that the kernel tgt code (scsi_tgt*) receives
SCSI commands from one lld and send them to another lld instead of
sending them to user space.


Although the approach of passing SCSI commands from a target LLD to an
initiator one without any significant interventions from the target
software looks to be nice and simple, you should realize how limited,
unsafe and illegal it is, since it badly violates SCSI specs.



I think that 'implemented cleanly' means that one scsi_host is assigned
to only one initiator.


Sorry, I don't fully understand you. If you mean you are going to limit 
only one remote initiator per-target device, then, well, is it even more 
limited (and limiting) or not?



The target software assigns one scsi_host to only one remote
initiator. For FC, NPIV works nicely.


OK, if such limitation is OK for your users, then I'm happy for you.


___
Stgt-devel mailing list
[EMAIL PROTECTED]
https://lists.berlios.de/mailman/listinfo/stgt-devel



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Stgt-devel] Question for pass-through target design

2007-05-07 Thread Vladislav Bolkhovitin

Vladislav Bolkhovitin wrote:

FUJITA Tomonori wrote:


From: Vladislav Bolkhovitin [EMAIL PROTECTED]
Subject: Re: [Stgt-devel] Question for pass-through target design
Date: Mon, 07 May 2007 19:27:23 +0400



FUJITA Tomonori wrote:


From: Vladislav Bolkhovitin [EMAIL PROTECTED]
Subject: Re: [Stgt-devel] Question for pass-through target design
Date: Mon, 07 May 2007 18:24:44 +0400




FUJITA Tomonori wrote:


It looks like the pass-through target support is currently 
broken, at
least as I've checked for ibmvstgt, but I think it's a general 
problem.

I wanted to check my assumptions and get ideas.



Yeah, unfortunately, it works only with the iSCSI target driver 
(which

runs in user space).





The code isn't allocating any memory to pass along to the sg 
code to store
the result of a read or data for a write.  Currently, dxferp 
for sg_io_hdr
or dout_xferp/din_xferp for sg_io_v4 are assigned to the value 
of uaddr,
which is set to 0 in kern_queue_cmd.  With the pointer set to 
NULL,
the pass-through target isn't going to function.  Even if we 
had memory
allocated, there isn't a means of getting data to be written 
via sg down

this code path.

What ideas are there as to how the data will get to user-space 
so that

we can use sg?



For kernel-space drivers, we don't need to go to user-space. We 
can do
the pass-through in kernel space. I talked with James about this 
last

year and he said that if the code is implemented cleanly, he would
merges it into mainline.



We already have a pass-through in the kernel space for
kernel space drivers. It is the scsi_tgt* code.




Could you elaborate more?

What I meant that is that the kernel tgt code (scsi_tgt*) receives
SCSI commands from one lld and send them to another lld instead of
sending them to user space.



Although the approach of passing SCSI commands from a target LLD to an
initiator one without any significant interventions from the target
software looks to be nice and simple, you should realize how limited,
unsafe and illegal it is, since it badly violates SCSI specs.




I think that 'implemented cleanly' means that one scsi_host is assigned
to only one initiator.



Sorry, I don't fully understand you. If you mean you are going to 
limit only one remote initiator per-target device, then, well, is it 
even more limited (and limiting) or not?




The target software assigns one scsi_host to only one remote
initiator. For FC, NPIV works nicely.



OK, if such limitation is OK for your users, then I'm happy for you.


And don't forget to tell them that they must not touch the exported 
devices locally ;)



___
Stgt-devel mailing list
[EMAIL PROTECTED]
https://lists.berlios.de/mailman/listinfo/stgt-devel






-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-22 Thread Vladislav Bolkhovitin

Tejun Heo wrote:

Hello, Douglas.

Douglas Gilbert wrote:


Tejun,
I note at this point that the IMMED bit in the
START STOP UNIT cdb is clear. [The code might
note that as well.] All SCSI disks that I have
seen, implement the IMMED bit and according to
the SAT standard, so should SAT layers like the
one in libata.

With the IMMED bit clear:
 - on spin up, it will wait until disk is ready.
   Okay unless there are a lot of disks, in
   which case we could ask Matthew Wilcox for help
 - on spin down, will wait until media is
   stopped. That could be 20 seconds, and if there
   were multiple disks 

I guess the question is do we need to wait until a
disk is spun down before dropping power to it
and suspending.



I think we do.  As we're issuing SYNCHRONIZE CACHE prior to spinning
down disks, it's probably okay to drop power early data-integrity-wise
but still...

We can definitely use IMMED=1 during resume (needs to be throttled
somehow tho).  This helps even when there is only one disk.  We can let
the disk spin up in the background and proceed with the rest of resuming
process.  Unfortunately, libata SAT layer doesn't do IMMED and even if
it does (I've tried and have a patch available) it doesn't really work
because during host resume each port enters EH and resets and
revalidates each device.  Many if not most ATA harddisks don't respond
to reset or IDENTIFY till it's fully spun up meaning libata EH has to
wait for all drives to spin up.  libata EH runs inside SCSI EH thread
meaning SCSI comman issue blocks till libata EH finishes resetting the
port.  So, IMMED or not, sd gotta wait for libata disks.

If we want to do parallel spin down, PM core needs to be updated such
that there are two events - issue and done - somewhat similar to what
SCSI is doing to probe devices parallelly.  If we're gonna do that, we
maybe can apply the same mechanism to resume path so that we can do
things parallelly IMMED or not.


Seems, there is another way of doing a bank spin up / spin down: doing 
it in two passes. On the first pass START_STOP will be issued with 
IMMED=1 on all devices, then on the second pass START_STOP will be 
issued with IMMED=0. So the devices will spin up / spin down in the 
parallel, but synchronously, hence the needed result will be achieved 
with minimal code changes, although it will indeed need upper layer 
changes in struct device_driver's suspend(), resume(), etc. callers.


Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] sd: implement START/STOP management

2007-03-22 Thread Vladislav Bolkhovitin

Henrique de Moraes Holschuh wrote:

On Thu, 22 Mar 2007, Vladislav Bolkhovitin wrote:

Seems, there is another way of doing a bank spin up / spin down: doing 
it in two passes. On the first pass START_STOP will be issued with 
IMMED=1 on all devices, then on the second pass START_STOP will be 
issued with IMMED=0. So the devices will spin up / spin down in the 
parallel, but synchronously, hence the needed result will be achieved 



And maybe trip the PSU's overcurrent defenses?  There is a reason to default
to sequential spin-up for disks... 


But on spin down there is no such problem


Of course, it can be user-selectable. But should it be the default?



-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] SCSI target for IBM Power5 LPAR

2005-09-07 Thread Vladislav Bolkhovitin

Dave C Boutcher wrote:

On Wed, Sep 07, 2005 at 12:49:32PM +0200, Christoph Hellwig wrote:


On Tue, Sep 06, 2005 at 04:28:01PM -0500, Dave C Boutcher wrote:


This device driver provides the SCSI target side of the virtual
SCSI on IBM Power5 systems.  The initiator side has been in mainline
for a while now (drivers/scsi/ibmvscsi/ibmvscsi.c.)  Targets already
exist for AIX and OS/400.


Please try to integrate that with the generic scsi target framework at
http://developer.berlios.de/projects/stgt/.



There hasn't been a lot of forward progress on stgt in over a year, and
there were some issues (lack of scatterlist support, synchronous and
serial command execution) that were an issue when last I looked.

Vlad, can you comment on the state of stgt and whether you see it
being ready for mainline any time soon?


Sorry, I can see on stgt page only mail lists archive and not from start 
(from Aug 22). Mike, can I see stgt code and some design description, 
please? You can send it directly on my e-mail address, if necessary.


Vlad


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] SCSI target for IBM Power5 LPAR/SCST 0.9.3-pre1 published

2005-09-07 Thread Vladislav Bolkhovitin

Mike Christie wrote:

Vladislav Bolkhovitin wrote:
Sorry, I can see on stgt page only mail lists archive and not from 
start (from Aug 22). Mike, can I see stgt code and some design 
description, please? You can send it directly on my e-mail address, if 
necessary.


goto the svn page for the code
http://developer.berlios.de/svn/?group_id=4492

As for design desc, I do not have anything. It is the evolving source :) 
We are slowly merging leasons we learned from open-iscsi, your SCST 
code, the available software and HW targets, and the SCSI ULD's 
scatterlist code which needs redoing so it is a bit of a mess.


OK, thanks, will try tomorrow.

I put SCST 0.9.3-pre1 on its page 
(http://sourceforge.net/projects/scst/). This is not the latest, but 
this is the one, which working. At the end of this week I'll try to put 
there the latest one as well. Hope, you will learn some more lessons 
from it :).


Any comments are welcome.

Vlad

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] iSCSI enterprise target software

2005-03-02 Thread Vladislav Bolkhovitin
Bryan Henderson wrote:
You want to *use* the kernel pagecache as much as you can.

No, I really don't.  Not always.  I can think of only 2 reasons to 
maximize my use of the kernel pagecache: 1) saves me duplicating code; 2) 
allows me to share resources (memory and disk bandwidth come to mind) with 
others in the same Linux system fairly.  There are many cases where those 
two benefits are outweighed by the benefits of using some other cache.  If 
you're thinking of other benefits of using the pagecache, let's hear them.
You forgot the third reason (benefit), though it doesn't directly 
related to the page cache: read ahead. It greatly influences on the 
performance and a direct I/O application has to reimplement this logic, 
which generally isn't straightforward task.

Vlad
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html