sijie commented on a change in pull request #4079: PIP-34 Key_Shared
subscription core implementation.
URL: https://github.com/apache/pulsar/pull/4079#discussion_r277964196
##########
File path:
pulsar-client-api/src/main/java/org/apache/pulsar/client/api/TypedMessageBuilder.java
##########
@@ -103,6 +103,15 @@
*/
TypedMessageBuilder<T> keyBytes(byte[] key);
+ /**
+ * Sets the ordering key of the message for message dispatch in {@link
SubscriptionType#Key_Shared} mode.
+ * Partition key Will be used if ordering key not specified
+ *
+ * @param orderingKey the ordering key for the message
+ * @return the message builder instance
+ */
+ TypedMessageBuilder<T> orderingKey(byte[] orderingKey);
Review comment:
> I think that's a bit generic at this point. I don't think it's even
possible to do CDC on Spanner. In any case I don't see why would that be
strictly required for this feature.
I just used Spanner as an example here. Whether Spanner supports CDC is not
the point to discuss. There are many open source Spanner-like NewSQL databases.
E.g. TiDB, YugaByte, and many in-house solutions. I know there are already
people working on integrations between TiDB and Pulsar, where the ordering key
shines there.
> As always, I think it's better to add things in the API when there is a
concrete need, rather than speculate possible use cases that might not apply.
Why do you think there is no concrete need when people propose a new PIP?
> In this case, since the application expect messages in order by
conversation_id, using that as the partitioning key will achieve the same
identical behavior.
Pulsar is a multiple subscription system. One subscription can use failover
subscription, while the other subscription can use key_shared subscription. You
can't force the application to choose conversation id as the partition key. As
I said, how applications can use these two keys varies from their needs.
> Why would you care about routing per user_id if you just care of ordering
per partition_id?
Because there are subscriptions required to consume all the events from a
particular user_id.
> Finally, as mentioned above I think that "ordering-key" is a very
misleading name. It really would be a "sub-key", "delivery-key", "dispatch-key"
or other name.
I agree that "ordering" can have a different meaning in different context.
It can mean - publish-order, log-order, consumer-order, dispatch-order,
key-order. However I don't think "sub-key", "delivery-key" or "dispatch-key" is
a better name than "ordering key". In some cases, the ordering key is a
"sub-key", but it can be a completely different key while in other cases. Same
applies to "delivery-key" or "dispatch-key".
IMO "ordering key" is not a bad name. It is a name that people already have
some general ideas about it. Also people generally understand what partitions
key and ordering key means. Applications can choose how to use them to adopt to
their use cases.
However, I am also not particularly strong on the name itself. We could have
called it others if there was a better name came up in the PIP discussion email
thread.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services