Marcy, in answer to your question on error messages from VM:
it depends on whether the Linux guest is APVIRT or APDED.
With APDED guests, VM plays a minimal role - basically a configuration
role that assigns a subset of its crypto resources to the guest.
Thereafter the guest has direct access to those assigned h/w resources.
The APDED guest's AP numbers and Domain numbers are precisely the same
as those assigned to the zVM LPAR, except of course the guest sees (and
is authorized only to see) a subset. Error reporting will be largely in
the hands of Linux.
With APVIRT guests we consign a group of VM's crypto resources to a
shared pool. VM manages that pool in the following ways assuming the
Dynamic Crypto APAR (VM66266) is installed:
1) it directs APVIRT guest crypto requests to a member of the pool. Each
guest thinks it has AP 01, Domain 01. This is in fact a simulated
(virtualized) crypto resource.
2) it directs the response from a member of the pool to the originating
guest. By the way there's no chance of cross-contamination of one APVIRT
guest's crypto responses with another: each requests is uniquely tagged
to the originating guest and the tagging is carried forward by the h/w
into the associated response.
3) It redirects requests sent to failed crypto resources to working
resources without intervention by the guest.
4) It monitors for troublesome messages that seem to cause repeated
errors on being continually redirected and fails the request is the
message is redirected more that 10 times.
5) If all resources in the shared pool are temporarily unavailable (busy
state on the query command) then VM will warn the operator. However, VM
will forward the request automatically as soon a resource in the shared
pool becomes available.
6) If all resource in the shared pool become permanently unavailable
(checkstop, configured off, unassigned) then we warn the operator and
kill off the messages with simulated h/w failure errors.
In cases 4-6, there will be messages issued by VM's control program to
the operator. We maintain counts of similar errors and report those
counts in the messages. But, so as not to flood the console, we suppress
messages triggered by the same resource or guest or requests to one
every two minutes. There were a number of new messages created with
VM66266 to address the APVIRT RAS enhancements.
The bottom line is you'll be more dependent on Linux for crypto errors
with APDED guests and more dependent on VM with APVIRT guests.
- Richard (zVM crypto/CP Dev)
On 22/01/2020 00:32, Marcy Cortes wrote:
This brings up another set of questions from me :)
Under the assumption that hardware eventually fails and I could lose a card...
If there's two on a guest I assume things seamlessly continue on if one card
fails? Do I get messages on Linux, VM, or the HW if that should happen?
If there's only one and that card fails, does the file system get unmounted
and/or throw errors? Or does it continue on and just have issues at next
reboot?
Is there any way to test card failure?
Yes, we have plenty of HA in many forms (tsamp, db2 hadr, external load
balancers, multiple cecs, multiple servers, multiple data centers, gpfs, etc)
and they are complex with different recovery times and data loss as you mention.
I'm still in exploration phase so I can't answer the how many are needed. I'm trying to
tell mgmt. what we can do with what we have, what it will mean to grow it, and what value
it provides. I'm afraid that there is some belief that we can "just do all of
it". And what real value is there when the only group this buys protection from is
our z storage admins (we already have hw level to protect devices that leave the
datacenter). Slick marketing presentations abound :)
From page 6 of this redpiece here
http://www.redbooks.ibm.com/redpapers/pdfs/redp5464.pdf
"IBM Z makes it possible, for the first time, for organizations to pervasively
encrypt data associated with an entire application, cloud service, or database in flight
or at rest with one click."
Still looking for that one click button!
Marcy
-----Original Message-----
From: Linux on 390 Port <[email protected]> On Behalf Of Reinhard Buendgen
Sent: Tuesday, January 21, 2020 12:55 AM
To: [email protected]
Subject: Re: [LINUX-390] Pervasive disk encryption questions
Tim,
I fully agree. Yet the Z platform is designed for RAS where
the"R"eliabiity translates to redundancy of the available resources
either within the system for built-in resources or as an configuration
option for external resources. The number 680 just reflects the
recommendation to achieve crypto redundancy per configuration (once
configured properly the Linux kernel will do the rest).
Whether that form of redundancy is the best form in an specific customer
environment is up to the customer.
As for the level of redundancy (device redundancy, HA cluster, or DR
cluster), it is the customers choice to decide the kind of penalty (ms,
secs , mins) he or she is willing to accept in case of a the failure of
a single resource. Also note that for certain workloads (workloads
managing a shared state, e.g. R/W data bases), HA clusters may be
pretty complex and impact performance.
-Reinhard
On 21.01.20 08:59, Timothy Sipples wrote:
I'd like to comment on the 680 number for a moment. I don't think 680 is
the correct number of Linux guests that can use protected key
dm-crypt/LUKS2 encrypted volumes. I'd like to argue the case for why the
current maximum number is 1,360 guests per machine that can use this
particular feature. (It's a security feature that doesn't exist on any
other platform, we should note, so it's either 680 or 1,360 more Linux
guests than any other machine.)
The number 680 is derived by taking the current maximum number of physical
Crypto Express features per machine (16), configuring them all in CCA mode,
multiplying by the current maximum number of domains per feature (85)(*),
then dividing in half, with the idea being that each Linux guest would
benefit from the services of two CCA domains spread across two physical
Crypto Express features.
I think this last assumption is fairly arbitrary. A single Linux guest is
one kernel running within only one instance of the hypervisor (which may or
may not be nested). It's a singleton, inherently. In a production
environment you'd presumably have something more than singleton Linux
guests running particular workloads, at least if they're important
workloads. You pick up redundancy there. If a particular Linux guest is
offline for whatever reason, there's another handling the workload (or
ready to handle it), with its own Crypto Express domain.
You certainly could decide to add Crypto Express redundancy on a per guest
basis in addition to whole Linux guest redundancy, but if you're going to
measure the outer bound maximum number I don't think you ought to assume
"redundancy squared." It seems rather arbitrary to me that that's where you
draw that particular line.
There is no intrinsic limit to the number of Linux guests using
dm-crypt/LUKS2 encrypted volumes with clear keys.
You can also decide on a guest-by-guest basis whether to double up on
Crypto Express CCA domains or not, which would mean a current upper bound
limit somewhere between 680 and 1,360 Linux guests using CCA domains.
And/or you can decide how many Crypto Express features you want to
configure in another mode, notably EP11. If for example you configure two
Crypto Express features in EP11 mode, then there are up to 14 available for
CCA mode, supporting up to 1,190 Linux guests using protected key
dm-crypt/LUKS2 (up to 595 if you decide to double them all up, or somewhere
in between if you double up some of them).
Anyway, this is an interesting discussion! If you're pushing these limits
or at least forecast you will, let IBM know, officially.
(*) This particular number is 40 on IBM z14 ZR1, LinuxONE Rockhopper II,
and their predecessor models. Adjust the rest of the math accordingly for
these machine models.
--------------------------------------------------------------------------------------------------------
Timothy Sipples
IT Architect Executive, Digital Asset & Other Industry Solutions, IBM Z &
LinuxONE
--------------------------------------------------------------------------------------------------------
E-Mail: [email protected]
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
https://urldefense.proofpoint.com/v2/url?u=http-3A__www2.marist.edu_htbin_wlvindex-3FLINUX-2D390&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=gWfH_UdD2c8k0h4gnfTSvBvnpNbusYa8zjPXy5D4rRk&m=XPJkwuK5GHoNNkpv30UY2Yd0I_4dHJtMN7x7wsTD4rc&s=KqsgWBv0cXJZZlSPDV0LDdbdnajhKVM12nr-LjyNEjM&e=
----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to [email protected] with the message: INFO LINUX-390 or visit
http://www2.marist.edu/htbin/wlvindex?LINUX-390