This is an automated email from the ASF dual-hosted git repository.
mck pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-website.git
The following commit(s) were added to refs/heads/trunk by this push:
new 327f02f6 BLOG - Apache Cassandra 5.0 Features: Dynamic Data Masking
327f02f6 is described below
commit 327f02f610b9e7b664a316ac7c29754550b17095
Author: Diogenese Topper <[email protected]>
AuthorDate: Wed Oct 11 17:45:56 2023 -0700
BLOG - Apache Cassandra 5.0 Features: Dynamic Data Masking
patch by Diogenese Topper; reviewed by Mick Semb Wever for CASSANDRA-18923
---
site-content/source/modules/ROOT/pages/blog.adoc | 23 +++
...assandra-5.0-Features-Dynamic-Data-Masking.adoc | 230 +++++++++++++++++++++
2 files changed, 253 insertions(+)
diff --git a/site-content/source/modules/ROOT/pages/blog.adoc
b/site-content/source/modules/ROOT/pages/blog.adoc
index 8e826ac7..585b99e2 100644
--- a/site-content/source/modules/ROOT/pages/blog.adoc
+++ b/site-content/source/modules/ROOT/pages/blog.adoc
@@ -8,6 +8,29 @@ NOTES FOR CONTENT CREATORS
- Replace post tile, date, description and link to you post.
////
+//start card
+[openblock,card shadow relative test]
+----
+[openblock,card-header]
+------
+[discrete]
+=== Apache Cassandra 5.0 Features: Dynamic Data Masking
+[discrete]
+==== October 11, 2023
+------
+[openblock,card-content]
+------
+Apache Cassandra 5.0 adds Dynamic Data Masking for secure data retrieval via
masking functions and permissions.
+[openblock,card-btn card-btn--blog]
+--------
+[.btn.btn--alt]
+xref:blog/Apache-Cassandra-5.0-Features-Dynamic-Data-Masking.adoc[Read More]
+--------
+
+------
+----
+//end card
+
//start card
[openblock,card shadow relative test]
----
diff --git
a/site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-5.0-Features-Dynamic-Data-Masking.adoc
b/site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-5.0-Features-Dynamic-Data-Masking.adoc
new file mode 100644
index 00000000..039bd60d
--- /dev/null
+++
b/site-content/source/modules/ROOT/pages/blog/Apache-Cassandra-5.0-Features-Dynamic-Data-Masking.adoc
@@ -0,0 +1,230 @@
+= Apache Cassandra 5.0 Features: Dynamic Data Masking
+:page-layout: single-post
+:page-role: blog-post
+:page-post-date: October 11, 2023
+:page-post-author: Andrés de la Peña
+:description: New dynamic data masking capabilities are a feature available in
the coming Apache Cassandra 5.0.
+:keywords:
+
+__Apache Cassandra 5.0 is the project’s major release for 2023, and it
promises some of the biggest changes for Cassandra to-date. After more than a
decade of engineering work dedicated to stabilizing and building Cassandra as a
distributed database, we now look forward to introducing a host of exciting
features and enhancements that empower users to take their data-driven
applications to the next level - including machine learning and artificial
intelligence.__
+
+__This blog series aims to give a deeper dive into some of the key features of
Cassandra 5.0.__
+
+Cassandra 5.0 introduces new dynamic data masking (DDM) capabilities which
allows you to obscure sensitive information using a concept called masked
columns. DDM doesn't change the stored data. Instead, it just presents the data
in its redacted form during SELECT queries.
+
+DDM aims to provide some degree of protection against accidental data
exposure. For example, the column masks can prevent undesired data exposure
when a user shares the results of a query with someone. Or it can limit what
users with read access can actually see. However, it's important to know that
anyone with direct access to the sstable files will be able to read the clear
data.
+
+The data can be partially redacted, for example showing only the four last
digits of a credit card number. This partial view of the data can be enough to
allow users to do some checks on the data without seeing it entirely. The data
can also be fully redacted, making it clear for users that the information is
sensitive.
+
+== Masking functions
+
+DDM is based on a set of
https://cassandra.apache.org/doc/5.0/cassandra/developing/cql/functions.html[CQL
built-in (native) functions^] that obscure sensitive information. The
available functions are:
+
+* **mask_null**: Replaces the first argument with a null column. The returned
value is always a non-existent column, and not a not-null column representing a
null value.
+* **mask_default**: Replaces its argument by an arbitrary, fixed default value
of the same type. This will be **** for text values, zero for numeric values,
false for booleans, etc.
+* **mask_replace**: Replaces the first argument by the replacement value on
the second argument. The replacement value needs to have the same type as the
replaced value.
+* **mask_inner**: Returns a copy of the first text, varchar or ascii argument,
replacing each character except the first and last ones by a padding character.
+* **mask_outer**: Returns a copy of the first text, varchar or ascii argument,
replacing the first and last character by a padding character.
+mask_hash: Returns a blob containing the hash of the first argument.
+
+These functions can be used in any SELECT query to get an obscured view of the
data. For example:
+
+----
+CREATE TABLE patients (
+ id timeuuid PRIMARY KEY,
+ name text,
+ birth date
+);
+
+
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'alice', '1982-01-02');
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'bob', '1982-01-02');
+
+
+SELECT mask_inner(name, 1, null), mask_default(birth) FROM patients;
+
+
+ system.mask_inner(name, 1, NULL) | system.mask_default(birth)
+-----------------------------------+----------------------------
+ b** | 1970-01-01
+ a**** | 1970-01-01
+----
+
+== Attaching masking functions to table columns
+
+The masking functions can be permanently attached to any column of a table. If
a masking function is defined, SELECT queries will always return the column
values in their masked form. The masking will be transparent to the users
running SELECT queries, so their only way to know that a column is masked will
be to consult the table definition.
+
+This is an optional feature that should be enabled with the
“dynamic_data_masking_enabled” property in the “cassandra.yaml” config file,
since it's disabled by default.
+
+The masks of the columns of a table can be defined on CREATE TABLE queries:
+
+----
+CREATE TABLE patients (
+ id timeuuid PRIMARY KEY,
+ name text MASKED WITH mask_inner(1, null),
+ birth date MASKED WITH mask_default()
+);
+----
+
+Data can be inserted into the masked table as usual. For example:
+
+----
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'alice', '1984-01-02');
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'bob', '1982-02-03');
+----
+
+The attached column masks will make SELECT queries automatically return masked
data,
+without the need of including the masking function in the query:
+
+----
+SELECT name, birth FROM patients;
+
+
+ name | birth
+-------+------------
+ a**** | 1970-01-01
+ b** | 1970-01-01
+----
+
+The masking function attached to a column can be changed with an ALTER TABLE
query:
+
+----
+ALTER TABLE patients ALTER name MASKED WITH mask_default();
+----
+
+In a similar way, a masking function can be detached from a column with an
ALTER TABLE query:
+
+----
+ALTER TABLE patients ALTER name DROP MASKED;
+----
+
+== Permissions
+
+The new UNMASK
https://cassandra.apache.org/doc/5.0/cassandra/developing/cql/security.html#cql-permissions[permission^]
allows users to retrieve the unmasked values of masked columns. Ordinary users
are created without the UNMASK permission and will see masked values.
Superusers are created with the UNMASK permission, and will be able to see the
unmasked values in a SELECT query results. As an example, suppose that we have
a table with masked columns:
+
+----
+CREATE TABLE patients (
+ id timeuuid PRIMARY KEY,
+ name text MASKED WITH mask_inner(1, null),
+ birth date MASKED WITH mask_default()
+);
+
+
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'alice', '1984-01-02');
+INSERT INTO patients(id, name, birth)
+ VALUES (now(), 'bob', '1982-02-03');
+----
+
+Then we create two users with SELECT permission for the table, but we only
grant the UNMASK permission to one of the users:
+
+----
+CRCREATE USER privileged WITH PASSWORD 'xyz';
+GRANT SELECT ON TABLE patients TO privileged;
+GRANT UNMASK ON TABLE patients TO privileged;
+
+
+CREATE USER unprivileged WITH PASSWORD 'xyz';
+GRANT SELECT ON TABLE patients TO unprivileged;
+----
+
+We can now see that the user with the UNMASK permission can see the clear
data, unmasked, whereas the user without the UNMASK permission can only see the
masked data:
+
+----
+LOGIN privileged
+SELECT name, birth FROM patients;
+
+
+ name | birth
+-------+------------
+ alice | 1984-01-02
+ bob | 1982-02-03
+
+
+LOGIN unprivileged
+SELECT name, birth FROM patients;
+
+
+ name | birth
+-------+------------
+ a**** | 1970-01-01
+ b** | 1970-01-01
+----
+
+Users without the UNMASK permission are not allowed to use masked columns in
the WHERE clause of a SELECT query. This prevents malicious users from figuring
out the clear data by running exhaustive queries. For example:
+
+----
+CREATE USER untrusted_user WITH PASSWORD 'xyz';
+GRANT SELECT ON TABLE patients TO untrusted_user;
+LOGIN untrusted_user
+
+
+SELECT name, birth FROM patients WHERE name = 'Alice' ALLOW FILTERING;
+
+
+// Unauthorized: Error from server: code=2100 [Unauthorized] message="User
untrusted_user has no UNMASK nor SELECT_UNMASK permission on table k.patients"
+----
+
+However, there are some use cases where trusted database users just need a
useful way to produce masked data that will be served to untrusted external
users. For example, a trusted app can connect to the database and extract
masked data that will be served to its end users. In that case the trusted user
(the app) can be given the SELECT_MASKED permission. That permission allows us
to use masked columns in the WHERE clause of a SELECT query, while still seeing
the masked data in the query [...]
+
+----
+CREATE USER trusted_user WITH PASSWORD 'xyz';
+GRANT SELECT, SELECT_MASKED ON TABLE patients TO trusted_user;
+LOGIN trusted_user
+
+
+SELECT name, birth FROM patients WHERE name = 'Alice' ALLOW FILTERING;
+
+
+ name | birth
+-------+------------
+ a**** | 1970-01-01
+----
+
+== Custom functions
+
+https://cassandra.apache.org/doc/5.0/cassandra/developing/cql/functions.html#user-defined-scalar-functions[Cassandra’s
user-defined functions (UDFs)^] can be attached to a table column. This allows
extending the functionality of DDM beyond the standard native functions. Any
UDF can be used as the mask of a column, provided that its first argument and
return type have the same type as the column to be masked. For instance:
+
+----
+CREATE FUNCTION redact(input text)
+ CALLED ON NULL INPUT
+ RETURNS text
+ LANGUAGE java
+ AS 'return "redacted"';
+
+
+CREATE TABLE patients (
+ id timeuuid PRIMARY KEY,
+ name text MASKED WITH redact()
+);
+----
+
+Learn More About Apache Cassandra
+
+As we get closer to the General Availability of Cassandra 5.0, there are a
host of ways to get more involved in the community and follow project
developments:
+
+Cassandra Summit + Code AI is taking place Dec. 12-13 in San Jose, CA.
Cassandra Summit is THE gathering place for Apache Cassandra data
practitioners, developers, engineers and enthusiasts, and it’s where we’ll be
diving deeper into Cassandra 5.0 features. Submit a talk for the NEW AI Track
at Cassandra Summit; CFP closes Monday, October 23 at 9:00 AM PDT (UTC-7).
+
+For more information about Apache Cassandra or to join the community
discussion, you can join us on these channels:
+Apache Cassandra Website
+ASF Slack
+Planet Cassandra Website
+Planet Cassandra Discord
+Planet Cassandra Global Meetup Group
+
+== Learn More About Apache Cassandra
+
+As we get closer to the General Availability of Cassandra 5.0, there are a
host of ways to get more involved in the community and follow project
developments:
+
+https://events.linuxfoundation.org/cassandra-summit/[Cassandra Summit + Code
AI^] is taking place Dec. 12-13 in San Jose, CA. Cassandra Summit is THE
gathering place for Apache Cassandra data practitioners, developers, engineers
and enthusiasts, and it’s where we’ll be diving deeper into Cassandra 5.0
features.
https://events.linuxfoundation.org/cassandra-summit/program/cfp/#overview[Submit
a talk^] for the NEW AI Track at Cassandra Summit; CFP closes Monday, October
23 at 9:00 AM PDT (UTC-7).
+
+For more information about Apache Cassandra or to join the community
discussion, you can join us on these channels:
+
+* https://cassandra.apache.org/_/index.html[Apache Cassandra Website]
+* https://the-asf.slack.com/ssb/redirect[ASF Slack^]
+* https://www.youtube.com/@PlanetCassandra[Planet Cassandra Youtube^]
+* https://www.meetup.com/cassandra-global/[Planet Cassandra Global Meetup
Group^]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]