potiuk commented on a change in pull request #20473:
URL: https://github.com/apache/airflow/pull/20473#discussion_r776038635



##########
File path: airflow/providers/ssh/operators/ssh.py
##########
@@ -218,10 +215,7 @@ def execute(self, context=None) -> Union[bytes, str]:
                 result = self.run_ssh_client_command(ssh_client, self.command)
         except Exception as e:
             raise AirflowException(f"SSH operator error: {str(e)}")
-        enable_pickling = conf.getboolean('core', 'enable_xcom_pickling')
-        if not enable_pickling:
-            result = b64encode(result).decode('utf-8')
-        return result
+        return result.decode('utf-8')

Review comment:
       How do we know we can decode using utf-8 here? I think the main problem 
here is that the ssh command might simply return (depend which command you run) 
- very raw bytes, not even encoded-string. For example when you run  `ssh 
root@host 'cat /bin/bash'` without allocating terminal, you will get content of 
the 'bash' binary returned. This will work in the old SSH in case of  both 
flags are set:
   
   - with pickling - content of the file will be saver as byte array in xcom 
(via pickling)
   - without pickling - the file content will be base64encoded and then 
converted to the utf8-string
   
   After the change, this will fail and there is no way to send binary, 
non-encoded data over SSH.
   
   I can't exactly remember, I tried to find it but I was unable to. However  I 
vaguely recall a discussion that someone had quite a legitimate reason for this 
- it could be kerberos  keytab sent over ssh connection automatically when 
authorized or something equally "convoluted". There are some "security-ish" 
approaches where you want to utilize ssh authentication, connect to a server 
and you receieve binary data. 
   
   And the current implementation actually handles this case very well.
   
   Whether it's a legitimate case, that is a different story, but I think we 
cannot just "scrap" the behaviour. And for sure assuming that the data returned 
by SSH connection will be utf-8 encoded is wrong.
   
   If we really want to store strings, I'd say we should simply add a 
"result_encoding" parameter and if it is set, we cold do what you did. But this 
will add even more complexity to what could be expected from ssh operator. 
   
   So I am not really sure if we want to do anyting here :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to