Re: [DISCUSS] CEP-20: Dynamic Data Masking

Jeremiah D Jordan Wed, 07 Sep 2022 08:02:26 -0700
A

> On Sep 7, 2022, at 8:58 AM, Benedict <bened...@apache.org> wrote:
> 
> Well, I am not convinced these changes will materially impact the outcome, 
> but at least we’ll have some extra fun collating the votes.
> 
> 
>> On 7 Sep 2022, at 14:05, Andrés de la Peña <adelap...@apache.org> wrote:
>> 
>> 
>> The poll makes sense to me. I would slightly change it to:
>> 
>> A) We shouldn't prefer neither approach, and I agree to the implementor 
>> selecting the table schema approach for this CEP
>> B) We should prefer the view approach, but I am not opposed to the 
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should implement 
>> the view approach
>> D) We should NOT implement the table view approach, and should implement the 
>> schema approach
>> E) We should NOT implement the table schema approach, and should implement 
>> some other scheme (or not implement this feature)
>> 
>> Where my vote is for A.
>> 
>> 
>> On Wed, 7 Sept 2022 at 13:12, Benedict <bened...@apache.org 
>> <mailto:bened...@apache.org>> wrote:
>> I’m not convinced there’s been adequate resolution over which approach is 
>> adopted. I know you have expressed a preference for the table schema 
>> approach, but the weight of other opinion so far appears to be against this 
>> approach - even if it is broadly adopted by other databases. I will note 
>> that Postgres does not adopt this approach, it has a more sophisticated 
>> security label approach that has not been proposed by anybody so far.
>> 
>> I think extra weight should be given to the implementer’s preference, so 
>> while I personally do not like the table schema approach, I am happy to 
>> accept this is an industry norm, and leave the decision to you.
>> 
>> However, we should ensure the community as a whole endorses this. I think an 
>> indicative poll should be undertaken first, eg:
>> 
>> A) We should implement the table schema approach, as proposed
>> B) We should prefer the view approach, but I am not opposed to the 
>> implementor selecting the table schema approach for this CEP
>> C) We should NOT implement the table schema approach, and should implement 
>> the view approach
>> D) We should NOT implement the table schema approach, and should implement 
>> some other scheme (or not implement this feature)
>> 
>> Where my vote is B
>> 
>>> On 7 Sep 2022, at 12:50, Andrés de la Peña <adelap...@apache.org 
>>> <mailto:adelap...@apache.org>> wrote:
>>> 
>>> 
>>> If nobody has more concerns regarding the CEP I will start the vote 
>>> tomorrow.
>>> 
>>> On Wed, 31 Aug 2022 at 13:18, Andrés de la Peña <adelap...@apache.org 
>>> <mailto:adelap...@apache.org>> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy 
>>> for displaying masking functions?
>>> 
>>> I'm not sure that views should be "the" strategy for masking functions. We 
>>> have multiple approaches here:
>>> 
>>> 1) CQL functions only. Users can decide to use the masking functions on 
>>> their own will. I think most dbs allow this pattern of usage, which is 
>>> quite straightforward. Obviously, it doesn't allow admins to decide enforce 
>>> users seeing only masked data. Nevertheless, it's still useful for trusted 
>>> database users generating masked data that will be consumed by the end 
>>> users of the application.
>>> 
>>> 2) Masking functions attached to specific columns. This way the same 
>>> queries will see different data (masked or not) depending on the 
>>> permissions of the user running the query. It has the advantage of not 
>>> requiring to change the queries that users with different permissions run. 
>>> The downside is that users would need to query the schema if they need to 
>>> know whether a column is masked, unless we change the names of the returned 
>>> columns. This is the approach offered by Azure/SQL Server, PostgreSQL, IBM 
>>> Db2, Oracle, MariaDB/MaxScale and SnowFlake. All these databases support 
>>> applying the masking function to columns on the base table, and some of 
>>> them also allow to apply masking to views.
>>> 
>>> 3) Masking functions as part of projected views. This ways users might need 
>>> to query the view appropriate for their permissions instead of the base 
>>> table. This might mean changing the queries if the masking policy is 
>>> changed by the admin. MySQL recommends this approach on a blog entry, 
>>> although it's not part of its main documentation for data masking, and the 
>>> implementation has security issues. Some of the other databases offering 
>>> the approach 2) as their main option also support masking on view columns.
>>> 
>>> Each approach has its own advantages and limitations, and I don't think we 
>>> necessarily have to choose. The CEP proposes implementing 1) and 2), but no 
>>> one impedes us to also have 3) if we get to have projected views. However, 
>>> I think that projected views is a new general-purpose feature with its own 
>>> complexities, so it would deserve its own CEP, if someone is willing to 
>>> work on the implementation.
>>> 
>>> 
>>> 
>>> On Wed, 31 Aug 2022 at 12:03, Claude Warren via dev 
>>> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote:
>>> Is there enough support here for VIEWS to be the implementation strategy 
>>> for displaying masking functions?
>>> 
>>> It seems to me the view would have to store the query and apply a where 
>>> clause to it, so the same PK would be in play.
>>> 
>>> It has data leaking properties.
>>> 
>>> It has more use cases as it can be used to
>>> 
>>> construct views that filter out sensitive columns
>>> apply transforms to convert units of measure
>>> Are there more thoughts along this line?
>>>
Re: [DISCUSS] CEP-20: Dynamic Data Masking

Reply via email to