potiuk commented on code in PR #35867:
URL: https://github.com/apache/airflow/pull/35867#discussion_r1408896953
##########
airflow/serialization/serde.py:
##########
@@ -83,6 +91,81 @@ def decode(d: dict[str, Any]) -> tuple[str, int, Any]:
return classname, version, data
+def encrypt(value: P) -> str:
Review Comment:
What I REALLY wanted to know about is wjy we are doing it this way. Why
encrypting selected fields during serialization and not thw whole serialized
form. It seems reasonable to encrypt the resulting serialized value (whole of
it) - why do we do it selectively in the first place? What's the use case? How
about security properties of it ?
Encruption in general is very easy to get wrong and unsecure and this one
seems to introduce our own encryption method which is iikely prone to an easy
way of loosing encryption properties.
For example when we try to encrypt indiviidual small values, it's very eeasy
to loose encryption propertes. Fernet is a symmetric encryption protocol and we
ar not using random seeds to do encryption. This basically means tthat by
encrypting 65535 int, you can easily create a dictionary and build a
reverse-encryption that will allow you to decrypt all 655356 first int values.
That's why almost always when you encrypt something, you encryptt more than jus
small pieces of data.
That's why It seem reasonable (also for perofmrnace reasons, not only
seceurity properties) to encrypt whole serialized "bllob", rather than
individual values. Maybe I simply miss the use case and maybe there is a reason
why want to build encryption into internal serialization protocol, risking that
we are introducing insecure way of doing things, but welll.... I miss the use
case. I do not understand why we want to encrypt individual values here. this
is what I am after, If I were to review it.
This is what I am trying to understand - not because I am somehow mean, but
because this is is really security. IIf have tomorrow someone who raises a
securtiy issue to us, that when we **think** something is secure, that
something is not and can be easily broken.
My security guu Bruce Schneirer's always says if you think you want to do
your own security/encryption mechanism because you think you can do better, you
are probably wrong. And for me this smells like something we do without fully
understanding of all the security properties of.
So why we are doing it. Can we get to the basica and explain why are we
trying to encrypt individual fields / primitives and not the whole serialized
blob? Have we thoght about security consequences of it ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]