merlimat commented on a change in pull request #4079: PIP-34 Key_Shared 
subscription core implementation.
URL: https://github.com/apache/pulsar/pull/4079#discussion_r277952135
 
 

 ##########
 File path: 
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/TypedMessageBuilder.java
 ##########
 @@ -103,6 +103,15 @@
      */
     TypedMessageBuilder<T> keyBytes(byte[] key);
 
+    /**
+     * Sets the ordering key of the message for message dispatch in {@link 
SubscriptionType#Key_Shared} mode.
+     * Partition key Will be used if ordering key not specified
+     *
+     * @param orderingKey the ordering key for the message
+     * @return the message builder instance
+     */
+    TypedMessageBuilder<T> orderingKey(byte[] orderingKey);
 
 Review comment:
   > @merlimat I have explained one of the use cases above - CDC for a 
distributed database which usually has 2 kinds of keys. there might be other 
use cases.
   
   I think that's a bit generic at this point. I don't think it's even possible 
to do CDC on Spanner. In any case I don't see why would that be strictly 
required for this feature.
   
   As always, I think it's better to add things in the API when there is a 
concrete need, rather than speculate possible use cases that might not apply.
   
   > first of all, from use case perspective, I think how applications define a 
partition key and an ordering key is really up to themselves. but having a way 
to specify an ordering key provides a flexible mechanism for people to adopt to 
their own use cases.
   
   The "partition-key" is what define the ordering guarantee. If you define 
another key it's not the "ordering-key", at most it would a sub-key, but the 
ordering in the log is defined by the partition key.
   
   
   > However that means in dispatching the messages, all the messages of a same 
key can only be sent to one consumer. This can limit the capability of 
key_shared subscription. the ordering key allows applications to configure a 
finer granularity "ordering_key" for scaling out the consumption beyond the 
original "key".
   
   > for example, in a social app, you have a conversation stream, where the 
stream is partitioned by from_user_id, so that all the conversations of a same 
"from_user" is in one partition, but you want the consumers to consume events 
by conversation_id (which is comprised of from_user_id and to_user_id). in this 
example, you are using "from_user_id" as the key, "conversation" as the 
ordering key.
   
   In this case, since the application expect messages in order by 
`conversation_id`, using that as the partitioning key will achieve the same 
identical behavior.
   
   >  where the stream is partitioned by from_user_id, so that all the 
conversations of a same "from_user" is in one partition
   
   Why would you care about routing per `user_id` if you just care of ordering 
per `partition_id`?
   
   Finally, as mentioned above I think that "ordering-key" is a very misleading 
name. It really would be a "sub-key", "delivery-key", "dispatch-key" or other 
name.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to