Does the client buffer/reserve any messages? - Pulsar stores the messages that haven't been acknowledged <https://pulsar.apache.org/docs/en/cookbooks-retention-expiry/#retention-policies>. And you acknowledged the message by calling consumer.acknowledge(msg);
On Fri, Aug 13, 2021 at 6:58 PM Luke Cwik <[email protected]> wrote: > Does the client buffer/reserve any messages? > For example, you call consumer.receive() so it goes and fetches the next > 100 messages assuming that you'll want them as well so the next 99 > consumer.receive() calls will be fast. > > On Fri, Aug 13, 2021 at 4:16 PM Marco Robles Pulido < > [email protected]> wrote: > >> >> >>> I am taking in consideration to use this approach for the PulsarIO - SDF >>> implementation, in which I pull N messages or use the default size for >>> the queue message in apache pulsar >>> <https://pulsar.apache.org/docs/en/concepts-messaging/#consumers> (which >>> is 1000) and keep adding to "N" as many messages are still being received >>> in the queue with help of the client listener >>> <https://pulsar.apache.org/docs/en/concepts-messaging/#listeners>, so >>> every time the listener is invoked will do something like N + 1. >>> >> >> Does the client "pull" the messages when you create the client or does it >> "pull" the messages when you perform an explicit API call like >> "getMessages"? >> >> - The client "pull" the messages when you perform *consumer.receive()*, >> you can see here a basic example of a Simple Consumer client >> <https://github.com/streamnative/streamnative-academy/blob/master/src/main/java/sn/academy/consumers/SimpleConsumer.java>, >> so each time consumer.receive() is called, a message is dequeued from the >> buffer. As well there is a way to keep listening messages using a listener >> as is shown in this client example >> <https://pulsar.apache.org/docs/en/client-libraries-java/#consumer> (see >> MessageListener example, above *Configure consumer* section) >> >> On Fri, Aug 13, 2021 at 5:14 PM Luke Cwik <[email protected]> wrote: >> >>> >>> >>> On Thu, Aug 12, 2021 at 3:31 PM Marco Robles Pulido < >>> [email protected]> wrote: >>> >>>> Sorry It was sent without finishing the email. >>>> >>>> Hi, >>>> So regarding to Luke comments for the first approach >>>> 1) Occasionally poll to see how many messages are still in the queue >>>> ahead of you so you can report the remaining work as 1 / >>>> numberOfInitialSplits * numOutstandngMessages >>>> - What do you mean for numOutstandingMessages? The messages that >>>> are still pending in the queue, then using the above formula, it will give >>>> you the remaining work that can be used for the restriction in which the >>>> initial restriction for SDF will be like, >>>> [0, 1 / numberOfInitialSplits * numOutstandngMessages) >>>> >>> >>> Yes. >>> >>> >>>> >>>> I am taking in consideration to use this approach for the PulsarIO - >>>> SDF implementation, in which I pull N messages or use the default size >>>> for the queue message in apache pulsar >>>> <https://pulsar.apache.org/docs/en/concepts-messaging/#consumers> (which >>>> is 1000) and keep adding to "N" as many messages are still being received >>>> in the queue with help of the client listener >>>> <https://pulsar.apache.org/docs/en/concepts-messaging/#listeners>, so >>>> every time the listener is invoked will do something like N + 1. >>>> >>> >>> Does the client "pull" the messages when you create the client or does >>> it "pull" the messages when you perform an explicit API call like >>> "getMessages"? >>> >>> >>>> >>>> Thanks in advance >>>> >>>> On Thu, Aug 12, 2021 at 5:18 PM Marco Robles Pulido < >>>> [email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> So regarding to Luke comments for the first approach >>>>> 1) Occasionally poll to see how many messages are still in the queue >>>>> ahead of you so you can report the remaining work as 1 / >>>>> numberOfInitialSplits * numOutstandngMessages >>>>> - What do you mean for numOutstandingMessages? The messages that >>>>> are still pending in the queue, then using the above formula, it will give >>>>> you the remaining work that can be used for the restriction in which the >>>>> initial restriction for SDF will be like, >>>>> >>>>> On Wed, Aug 4, 2021 at 11:02 AM Luke Cwik <[email protected]> wrote: >>>>> >>>>>> Your research into the SDF Kafka implementation seems spot on. >>>>>> >>>>>> I took a quick look at the links you had provided and for partitioned >>>>>> topics it looks like you don't have a choice where a Consumer is able to >>>>>> resume from as you have a typical get message and ack scheme client. In >>>>>> this kind of setup for an initial implementation it is best if you can: >>>>>> 1) Occasionally poll to see how many messages are still in the queue >>>>>> ahead of you so you can report the remaining work as 1 / >>>>>> numberOfInitialSplits * numOutstandngMessages >>>>>> 2) Use a pull model for messages (e.g. request N messages and output >>>>>> them all). This prevents an issue where the client library instances >>>>>> effectively are holding onto unprocessed messages while the bundle isn't >>>>>> being processed. >>>>>> 3) Only support checkpointing in the RestrictionTracker (adding >>>>>> support for dynamic splitting would be great but no runner would exercise >>>>>> it right now in a streaming pipeline) >>>>>> >>>>>> It looks like the above would work for both the multi-partition and >>>>>> single partition scenarios and still could parallelize to the capacity of >>>>>> what the brokers could handle. Note that in the future you could still >>>>>> have >>>>>> a single SDF implementation that handles two types of restrictions one >>>>>> being the Consumer based one and the other being the Reader based one >>>>>> (See >>>>>> Watch.java[1] for a growing and nongrowing restriction for what I mean by >>>>>> having different branching logic). In the future you would update the >>>>>> initial splitting logic to check whether the broker has a single >>>>>> partition >>>>>> and then you could create "Reader" restrictions but this would only be >>>>>> useful if you felt as though there was something to be gained from using >>>>>> it. For the Reader based interface: >>>>>> 4) Do you expect the user to supply the message id for the first >>>>>> message? (if so is there a way to partition the message id space? (e.g. >>>>>> in >>>>>> Kafka the id is a number that increments and you know where you are and >>>>>> can >>>>>> poll for the latest id so you can split the numerical range easily)) >>>>>> 5) What value do you see it providing? >>>>>> >>>>>> 1: >>>>>> https://github.com/apache/beam/blob/03a1cca42ceeec2e963ec14c9bc344956a8683b3/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L885 >>>>>> >>>>>> On Tue, Aug 3, 2021 at 1:17 PM Marco Robles Pulido < >>>>>> [email protected]> wrote: >>>>>> >>>>>>> Hi folks, >>>>>>> >>>>>>> I am working with the new PulsarIO connector with Beam, and most of >>>>>>> my work has been in researching how Pulsar works, as many of you know we >>>>>>> already have KafkaIO connector which is kind of similar to Pulsar but >>>>>>> there >>>>>>> is some difference that I have found during my research and I would >>>>>>> like to >>>>>>> know your input in how would you handle the implementation for SDF. Here >>>>>>> are my main concerns: >>>>>>> - As you may know kafka handles by default partitioned topics where >>>>>>> each message within the partition gets an incremental id, called offset. >>>>>>> Having this in mind SDF implementation for kafka works something like >>>>>>> this, >>>>>>> where the element to evaluate is the topic/partition and the >>>>>>> restrictions >>>>>>> are the start and end offsets. >>>>>>> - For Pulsar, partitioned topics are optional >>>>>>> <https://pulsar.apache.org/docs/en/concepts-messaging/#partitioned-topics> >>>>>>> or >>>>>>> well by default are handled by single broker, there is a possibility >>>>>>> where >>>>>>> you can use the partitioned topics, but you will limit the final user to >>>>>>> use only partitioned topics with pulsar, as well, there is a possibility >>>>>>> to manually handle cursors >>>>>>> <https://pulsar.apache.org/docs/en/2.5.1/concepts-clients/#reader-interface> >>>>>>> which will be the earliest and latest message available that may be >>>>>>> used as >>>>>>> restrictions (but implementing this will not allow to use partitioned >>>>>>> topics). So with this in mind I was thinking there should be two >>>>>>> implementations one that use partitioned topics with pulsar and the >>>>>>> other >>>>>>> one that manually handle cursors. >>>>>>> >>>>>>> So, let me know your ideas/input about it. And maybe If i am wrong >>>>>>> help to clarify the SDF restrictions for KafkaIO. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> *Marco Robles* *|* WIZELINE >>>>>>> >>>>>>> Software Engineer >>>>>>> >>>>>>> [email protected] >>>>>>> >>>>>>> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> *This email and its contents (including any attachments) are being >>>>>>> sent toyou on the condition of confidentiality and may be protected by >>>>>>> legalprivilege. Access to this email by anyone other than the intended >>>>>>> recipientis unauthorized. If you are not the intended recipient, please >>>>>>> immediatelynotify the sender by replying to this message and delete the >>>>>>> materialimmediately from your system. Any further use, dissemination, >>>>>>> distributionor reproduction of this email is strictly prohibited. >>>>>>> Further, >>>>>>> norepresentation is made with respect to any content contained in this >>>>>>> email.* >>>>>> >>>>>> >>>>> >>>>> -- >>>>> >>>>> *Marco Robles* *|* WIZELINE >>>>> >>>>> Software Engineer >>>>> >>>>> [email protected] >>>>> >>>>> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal. >>>>> >>>> >>>> >>>> -- >>>> >>>> *Marco Robles* *|* WIZELINE >>>> >>>> Software Engineer >>>> >>>> [email protected] >>>> >>>> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal. >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> *This email and its contents (including any attachments) are being sent >>>> toyou on the condition of confidentiality and may be protected by >>>> legalprivilege. Access to this email by anyone other than the intended >>>> recipientis unauthorized. If you are not the intended recipient, please >>>> immediatelynotify the sender by replying to this message and delete the >>>> materialimmediately from your system. Any further use, dissemination, >>>> distributionor reproduction of this email is strictly prohibited. Further, >>>> norepresentation is made with respect to any content contained in this >>>> email.* >>> >>> >> >> -- >> >> *Marco Robles* *|* WIZELINE >> >> Software Engineer >> >> [email protected] >> >> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal. >> >> >> >> >> >> >> >> >> *This email and its contents (including any attachments) are being sent >> toyou on the condition of confidentiality and may be protected by >> legalprivilege. Access to this email by anyone other than the intended >> recipientis unauthorized. If you are not the intended recipient, please >> immediatelynotify the sender by replying to this message and delete the >> materialimmediately from your system. Any further use, dissemination, >> distributionor reproduction of this email is strictly prohibited. Further, >> norepresentation is made with respect to any content contained in this >> email.* > > -- *Marco Robles* *|* WIZELINE Software Engineer [email protected] Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal. -- *This email and its contents (including any attachments) are being sent to you on the condition of confidentiality and may be protected by legal privilege. Access to this email by anyone other than the intended recipient is unauthorized. If you are not the intended recipient, please immediately notify the sender by replying to this message and delete the material immediately from your system. Any further use, dissemination, distribution or reproduction of this email is strictly prohibited. Further, no representation is made with respect to any content contained in this email.*
