poorbarcode commented on code in PR #20923:
URL: https://github.com/apache/pulsar/pull/20923#discussion_r1300518754


##########
pip/pip-290.md:
##########
@@ -0,0 +1,207 @@
+# Background knowledge
+
+### 1. Web Socket Proxy Server
+[Web Socket Proxy 
Server](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#run-the-websocket-service)
 provides a simple way to interact with Pulsar under `WSS` protocol.
+- When a 
[wss-producer](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#nodejs-producer)
 was registered, Web Socket Proxy Server will create a one-to-one producer to 
actually send messages to the Broker.
+- When a 
[wss-consumer](https://pulsar.apache.org/docs/3.0.x/client-libraries-websocket/#nodejs-consumer)
 was registered, Web Socket Proxy Server will create a one-to-one consumer to 
actually receive messages from the Broker and send them to WSS Consumer.
+
+### 2. When a user wants to encrypt the message payload, there are two 
solutions:
+- **Solution 1**: encrypt message payload before WSS Producer sends messages, 
and decrypt after WSS Consumer receives messages. If the user wants to use 
different encryption keys for different messages, they can set a 
[property](https://github.com/apache/pulsar/blob/master/pulsar-websocket/src/main/java/org/apache/pulsar/websocket/data/ProducerMessage.java#L38)
 into messages to indicate the message was encrypted by which key. But this 
solution has a shortcoming: if the user also has consumers with Java clients, 
then these consumers cannot auto-decrypt the messages(Normally, java clients 
can [decrypt messages 
automatically](https://pulsar.apache.org/docs/3.0.x/security-encryption/#how-it-works-in-pulsar)).
 And the benefit of this solution is that the user does not need to expose the 
private key to Web Socket Proxy Server.
+- **Solution 2**: In the release `2.11`, there is a 
[feature](https://github.com/apache/pulsar/pull/16234) that provides a way to 
set encrypt keys for the internal producers and consumers of Web Socket Proxy 
Server, but needs the user to upload both public key and private key into the 
Web Socket Proxy Server(in other words: user should expose the keys to Web 
Socket Proxy Server), there is a un-recommended workaround for this 
shortcoming<sup>[1]</sup>. The benefit is that the WSS producer and WSS 
consumer should not care about encryption and decryption.
+
+### 3. The message payload process during message sending
+- The Producer will composite several message payloads into a batched message 
payload if the producer is enabled batch;
+- The Producer will compress the batched message payload to a compressed 
payload if enabled compression;
+- After the previous two steps, the Producer encrypts the compressed payload 
to an encrypted payload.
+
+
+### 4. Encrypt context
+
+The Construction of the Encrypt Context:
+```json
+{
+  "batchSize": 2, // How many single messages are in the batch. If null, it 
means it is not a batched message.
+  "compressionType": "NONE", // the compression type.
+  "uncompressedMessageSize": 0, // the size of the uncompressed payload.
+  "keys": {
+    "client-rsa.pem": {  // key name.
+      "keyValue": "asdvfdw==", // key value.
+      "metadata": {} // extra props of the key.
+    }
+  },
+  "param": "Tfu1PxVm6S9D3+Hk" // the IV of current encryption for this 
message. 
+}
+```
+All the fields of Encrypt Context are used to parse the encrypted message 
payload. 
+- `keys` and `param` are used to decrypt the encrypted message payload. 
+- `compressionType` and `uncompressedMessageSize` are used to uncompress the 
compressed message payload.
+- `batchSize` is used to extract the batched message payload.
+
+There is another attribute named `encryptionAlgo` used to identify what 
encrypt algo is using, it is an optional attribute, so there is no such 
property in Encrypt Context.
+
+When the internal consumer of the Web Socket Proxy Server receives a message, 
if the message metadata indicates that the message is encrypted, the consumer 
will add Encrypt Context into the response for the WSS consumer. 
+
+### 5. Quick explanation of the used components in the section Design:
+- `CryptoKeyReader`: an interface that requires users to implement to read 
public key and private key.
+- `MessageCrypto`: a tool interface to encrypt and decrypt the message payload 
and add and extract encryption information for message metadata.
+
+# Motivation
+
+Therefore, there is no way to enable encryption under the WSS protocol and 
meet the following conditions:
+- WSS producer and WSS consumer did encrypt and decrypt themselves and did not 
share private keys to Web Socket Proxy Server.
+- Other clients(such as Java and CPP) can automatically decrypt the messages 
which WSS producer sent.
+
+# Goals
+Provide a way to make Web Socket Proxy Server just passes encrypt information 
to the client, the WSS producer and WSS consumer did encrypt and decrypt 
themselves.
+
+Since the order of producer operation for message payloads is `compression --> 
encryption,` users need to handle Compression themselves if needed.
+
+Since the order of consumer operation for message payload is `deencryption --> 
un-compression --> extract the batched messages`, users need to handle 
Un-compression amd Extract Batch Messages themselves if needed.
+
+Note: I want to cherry-pick this feature into `branch-2.11`.
+
+
+## Out of Scope
+This proposal does not intend to support the three features:
+- Support publishing "Null value messages" for WSS producers.
+- Support publishing "Chunked messages" for WSS producers.
+- Support publishing "Batched messages" for WSS producers.
+
+
+# High-Level Design
+**For WSS producers**: Web Socket Proxy Server marks the Producer as 
Client-Side Encryption Producer if a producer registered with a non-empty 
`encryptionKeyValues`, and discards server-side batch messages, server-side 
compression, and server-side encryption.
+
+**For WSS consumers**: Users can set the parameter `cryptoFailureAction` to 
`CONSUME` to directly receive the undecrypted message payload (it was supported 
before). 
+
+# Detailed Design
+**For the producers marked as Client-Side Encryption Producer**: 
+
+- forcefully set the component `CryptoKeyReader` to `DummyCryptoKeyReaderImpl`.
+  - `DummyCryptoKeyReaderImpl`: doesn't provide any public key or private key, 
and just returns `null`.
+- forcefully set the component `MessageCrypto` to `WSSDummyMessageCryptoImpl` 
to skip the message Server-Side encryption.
+  - `WSSDummyMessageCryptoImpl`: only set the encryption info into the message 
metadata and discard payload encryption.
+- forcefully set `enableBatching` to `false` to skip Server-Side batch 
messages building, and print a warning log if users set `enableBatching`, 
`batchingMaxMessages`, `maxPendingMessages`, `batchingMaxPublishDelay`.
+- forcefully set the `CompressionType` to `None` to skip the Server-Side 
compression, and print a warning log if users set `compressionType`.
+- forcefully set the param `enableChunking` to `false`(the default value is 
`false`) to prevent unexpected problems if the default setting is changed in 
the future.
+
+**For the client-side encryption consumers**: 
+
+- To avoid too many warning logs: after setting the config 
`cryptoFailureAction` of the consumer is `CONSUME`, just print an `INFO` level 
log when receiving an encrypted message if the consumer could not decrypt 
it(the original log level is `WARN`).
+
+
+### Public API
+
+#### [Endpoint: producer 
connect](https://pulsar.apache.org/docs/3.1.x/client-libraries-websocket/#producer-endpoint)
+Add query params below: 
+| param name | description|
+| --- | --- |
+| `encryptionKeyValues` | Base64 encoded and URL encoded secret key |
+| `encryptionKeyMetadata` | Base64 encoded and URL encoded and JSON formatted 
key-value metadata list of encryption key |

Review Comment:
   > Why not add the key metadata to the encryptionKeyValues JSON structure? So 
that it will align with the returned data structure to consumers.
   
   I added a new mode for the parameter `encryptionKeys`: If a producer 
registered with a JSON parameter `encryptionKeys`, and the 
`encryptionKeys[{key_name}].keyValue` is not empty, Web Socket Proxy Server 
will mark this Producer as Client-Side Encryption Producer, then discard 
server-side batch messages, server-side compression, and server-side 
encryption. 
   
   > And could you please also provide an example of what is the original data 
looks like? without base64 and URL encoding.
   
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to