merlimat commented on a change in pull request #4079: PIP-34 Key_Shared
subscription core implementation.
URL: https://github.com/apache/pulsar/pull/4079#discussion_r277952135
##########
File path:
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/TypedMessageBuilder.java
##########
@@ -103,6 +103,15 @@
*/
TypedMessageBuilder<T> keyBytes(byte[] key);
+ /**
+ * Sets the ordering key of the message for message dispatch in {@link
SubscriptionType#Key_Shared} mode.
+ * Partition key Will be used if ordering key not specified
+ *
+ * @param orderingKey the ordering key for the message
+ * @return the message builder instance
+ */
+ TypedMessageBuilder<T> orderingKey(byte[] orderingKey);
Review comment:
> @merlimat I have explained one of the use cases above - CDC for a
distributed database which usually has 2 kinds of keys. there might be other
use cases.
I think that's a bit generic at this point. I don't think it's even possible
to do CDC on Spanner. In any case I don't see why would that be strictly
required for this feature.
As always, I think it's better to add things in the API when there is a
concrete need, rather than speculate possible use cases that might not apply.
> first of all, from use case perspective, I think how applications define a
partition key and an ordering key is really up to themselves. but having a way
to specify an ordering key provides a flexible mechanism for people to adopt to
their own use cases.
The "partition-key" is what define the ordering guarantee. If you define
another key it's not the "ordering-key", at most it would a sub-key, but the
ordering in the log is defined by the partition key.
> However that means in dispatching the messages, all the messages of a same
key can only be sent to one consumer. This can limit the capability of
key_shared subscription. the ordering key allows applications to configure a
finer granularity "ordering_key" for scaling out the consumption beyond the
original "key".
> for example, in a social app, you have a conversation stream, where the
stream is partitioned by from_user_id, so that all the conversations of a same
"from_user" is in one partition, but you want the consumers to consume events
by conversation_id (which is comprised of from_user_id and to_user_id). in this
example, you are using "from_user_id" as the key, "conversation" as the
ordering key.
In this case, since the application expect messages in order by
`conversation_id`, using that as the partitioning key will achieve the same
identical behavior.
> where the stream is partitioned by from_user_id, so that all the
conversations of a same "from_user" is in one partition
Why would you care about routing per `user_id` if you just care of ordering
per `partition_id`?
Finally, as mentioned above I think that "ordering-key" is a very misleading
name. It really would be a "sub-key", "delivery-key", "dispatch-key" or other
name.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services