potiuk commented on code in PR #35867:
URL: https://github.com/apache/airflow/pull/35867#discussion_r1408896953


##########
airflow/serialization/serde.py:
##########
@@ -83,6 +91,81 @@ def decode(d: dict[str, Any]) -> tuple[str, int, Any]:
     return classname, version, data
 
 
+def encrypt(value: P) -> str:

Review Comment:
   What I REALLY wanted to know about is wjy we are doing it this way. Why 
encrypting selected fields during serialization and not thw whole serialized 
form. It seems reasonable to encrypt the resulting serialized value (whole of 
it) - why do we do it selectively in the first place? What's the use case? How 
about security properties of it ? 
   
   Encruption in general is very easy to get wrong and unsecure and this one 
seems to introduce our own encryption method which is iikely prone to an easy 
way  of loosing encryption properties. 
   
   For example when we try to encrypt indiviidual small values, it's very eeasy 
to loose encryption propertes. Fernet is a symmetric encryption protocol and we 
ar not using random  seeds to do encryption.  This  basically means tthat by 
encrypting 65535 int, you can easily create a dictionary and build a 
reverse-encryption that will allow you to decrypt all  655356 first int values. 
That's why almost always when you encrypt something, you encryptt more than jus 
small pieces of data. 
   
   That's why It seem reasonable (also for perofmrnace reasons, not only 
seceurity properties) to encrypt whole serialized "bllob", rather than 
individual values. Maybe I simply miss the use case and maybe there is a reason 
why want to build encryption into internal serialization protocol, risking that 
we are introducing insecure way of doing things, but welll.... I miss the use 
case. I do not understand why we want to encrypt individual values here. this 
is what I am after, If I were to review it.
   
   This is what I am trying to understand - not because I am somehow mean, but 
because this is is really security. IIf have  tomorrow someone who raises a 
securtiy issue to us, that when we **think** something is secure, that 
something is not and can be easily broken. 
   
   My security guu Bruce Schneirer's always says if you think you want to do 
your own security/encryption mechanism because you think you can do better, you 
are probably wrong. And for me this smells like something we do without fully 
understanding of all the security properties of. 
   
   So why we are doing it. Can we get to the basica and explain why are we 
trying to encrypt individual fields / primitives and not the whole serialized 
blob? Have we thoght about security consequences of it ?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to