Marcy, in answer to your question on error messages from VM:

it depends on whether the Linux guest is APVIRT or APDED.

With APDED guests, VM plays a minimal role - basically a configuration role that assigns a subset of its crypto resources to the guest. Thereafter the guest has direct access to those assigned h/w resources. The APDED guest's AP numbers and Domain numbers are precisely the same as those assigned to the zVM LPAR, except of course the guest sees (and is authorized only to see) a subset. Error reporting will be largely in the hands of Linux.


With APVIRT guests we consign a group of VM's crypto resources to a shared pool. VM manages that pool in the following ways assuming the Dynamic Crypto APAR (VM66266) is installed:

1) it directs APVIRT guest crypto requests to a member of the pool. Each guest thinks it has AP 01, Domain 01. This is in fact a simulated (virtualized) crypto resource.

2) it directs the  response from a member of the pool to the originating guest. By the way there's no chance of cross-contamination of one APVIRT guest's crypto responses with another: each requests is uniquely tagged to the originating guest and the tagging is carried forward by the h/w into the associated response.

3) It redirects requests sent to failed crypto resources to working resources without intervention by the guest.

4) It monitors for troublesome messages that seem to cause repeated errors on being continually redirected and fails the request is the message is redirected more that 10 times.

5) If all resources in the shared pool are temporarily unavailable (busy state on the query command) then VM will warn the operator. However, VM will forward the request automatically as soon a resource in the shared pool becomes available.

6) If all resource in the shared pool become permanently unavailable (checkstop, configured off, unassigned) then we warn the operator and kill off the messages with simulated h/w failure errors.


In cases 4-6, there will be messages issued by VM's control program to the operator. We maintain counts of similar errors and report those counts in the messages. But, so as not to flood the console, we suppress messages triggered by the same resource or guest or requests to one every two minutes. There were a number of new messages created with VM66266 to address the APVIRT RAS enhancements.



The bottom line is you'll be more dependent on Linux for crypto errors with APDED guests and more dependent on VM with APVIRT guests.

- Richard (zVM crypto/CP Dev)


On 22/01/2020 00:32, Marcy Cortes wrote:
This brings up another set of questions from me :)

Under the assumption that hardware eventually fails and I could lose a card...

If there's two on a guest I assume things seamlessly continue on if one card 
fails?  Do I get messages on Linux, VM, or the HW if that should happen?

If there's only one and that card fails, does the file system get unmounted 
and/or throw errors?  Or does it continue on and just have issues at next 
reboot?

Is there any way to test card failure?

Yes, we have plenty of HA in many forms (tsamp, db2 hadr, external load 
balancers, multiple cecs, multiple servers, multiple data centers, gpfs, etc) 
and they are complex with different recovery times and data loss as you mention.

I'm still in exploration phase so I can't answer the how many are needed.  I'm trying to 
tell mgmt. what we can do with what we have, what it will mean to grow it, and what value 
it provides.   I'm afraid that there is some belief that we can "just do all of 
it".   And what real value is there when the only group this buys protection from is 
our z storage admins (we already have hw level to protect devices that leave the 
datacenter).    Slick marketing presentations abound  :)

 From page 6 of this redpiece here 
http://www.redbooks.ibm.com/redpapers/pdfs/redp5464.pdf
"IBM Z makes it possible, for the first time, for organizations to pervasively 
encrypt data associated with an entire application, cloud service, or database in flight 
or at rest with one click."

Still looking for that one click button!
Marcy

-----Original Message-----
From: Linux on 390 Port <[email protected]> On Behalf Of Reinhard Buendgen
Sent: Tuesday, January 21, 2020 12:55 AM
To: [email protected]
Subject: Re: [LINUX-390] Pervasive disk encryption questions

Tim,

I fully agree. Yet the Z platform is designed for RAS where
the"R"eliabiity translates to redundancy of the available resources
either within the system for built-in resources or as an configuration
option for external resources. The number 680 just reflects the
recommendation to achieve crypto redundancy per configuration (once
configured properly the Linux kernel will do the rest).

Whether that form of redundancy is the best form in an specific customer
environment is up to the customer.

As for the level of redundancy (device redundancy, HA cluster, or DR
cluster), it is  the customers choice to decide the kind of penalty (ms,
secs , mins) he or she is willing to accept in case of a the failure of
a single resource. Also note that for certain workloads (workloads
managing a shared state,  e.g. R/W data bases), HA clusters may be
pretty complex and impact performance.

-Reinhard

On 21.01.20 08:59, Timothy Sipples wrote:
I'd like to comment on the 680 number for a moment. I don't think 680 is
the correct number of Linux guests that can use protected key
dm-crypt/LUKS2 encrypted volumes. I'd like to argue the case for why the
current maximum number is 1,360 guests per machine that can use this
particular feature. (It's a security feature that doesn't exist on any
other platform, we should note, so it's either 680 or 1,360 more Linux
guests than any other machine.)

The number 680 is derived by taking the current maximum number of physical
Crypto Express features per machine (16), configuring them all in CCA mode,
multiplying by the current maximum number of domains per feature (85)(*),
then dividing in half, with the idea being that each Linux guest would
benefit from the services of two CCA domains spread across two physical
Crypto Express features.

I think this last assumption is fairly arbitrary. A single Linux guest is
one kernel running within only one instance of the hypervisor (which may or
may not be nested). It's a singleton, inherently. In a production
environment you'd presumably have something more than singleton Linux
guests running particular workloads, at least if they're important
workloads. You pick up redundancy there. If a particular Linux guest is
offline for whatever reason, there's another handling the workload (or
ready to handle it), with its own Crypto Express domain.

You certainly could decide to add Crypto Express redundancy on a per guest
basis in addition to whole Linux guest redundancy, but if you're going to
measure the outer bound maximum number I don't think you ought to assume
"redundancy squared." It seems rather arbitrary to me that that's where you
draw that particular line.

There is no intrinsic limit to the number of Linux guests using
dm-crypt/LUKS2 encrypted volumes with clear keys.

You can also decide on a guest-by-guest basis whether to double up on
Crypto Express CCA domains or not, which would mean a current upper bound
limit somewhere between 680 and 1,360 Linux guests using CCA domains.
And/or you can decide how many Crypto Express features you want to
configure in another mode, notably EP11. If for example you configure two
Crypto Express features in EP11 mode, then there are up to 14 available for
CCA mode, supporting up to 1,190 Linux guests using protected key
dm-crypt/LUKS2 (up to 595 if you decide to double them all up, or somewhere
in between if you double up some of them).

Anyway, this is an interesting discussion! If you're pushing these limits
or at least forecast you will, let IBM know, officially.

(*) This particular number is 40 on IBM z14 ZR1, LinuxONE Rockhopper II,
and their predecessor models. Adjust the rest of the math accordingly for
these machine models.

--------------------------------------------------------------------------------------------------------
Timothy Sipples
IT Architect Executive, Digital Asset & Other Industry Solutions, IBM Z &
LinuxONE
--------------------------------------------------------------------------------------------------------

E-Mail: [email protected]

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to