Hi,

Yes, It is correct.

Thanks

On Sun, Sep 19, 2021 at 11:34 PM Matteo Merli <[email protected]> wrote:

> Hi,
>
> I just wanted to clarify the behavior of readers on partitioned topics in
> Pulsar.
>
> You have 2 main ways of consuming messages from pulsar topics:
>   1. Consumers -> cursor is managed by the system, based on acks. (still
> allows for seek() operations)
>   2. Readers -> reading position is managed by application (eg: by storing
> message ids into a state checkpoint).
>
> Consumers are automatically handling partitions, while readers are meant
> to work at the individual partition level.
>
> When using readers, it's definitely possible to use them on partitioned
> topics, just by creating 1 reader per partition. There is an easy way to
> discover the list of partitions:
>
> List<String> partitions =
> pulsarClient.getPartitionsForTopic("my-topic").join();
> for (String p : partitions) {
>     Reader<byte[]> reader = pulsarClient.newReader()
>                   .topic(p)
>                   .startMessageId(....)
>                   .create();
>
>     ///...
> }
>
> Matteo
>
> On 2021/09/17 20:14:07, Marco Robles <[email protected]> wrote:
> > Hi,
> >
> > I am dealing with some blockers during the PulsarIO SDF implementation,
> > checking back the comments you mentioned before. What do you mean with
> the
> > Second idea of using a pull model for messages, request N messages and
> > output them all, will it be something like I fetched N messages,
> processed
> > them, and the next iteration or split will be the same amount of N
> messages
> > to process, so the N will be a fixed number (let's say 100), so each
> split
> > will be splitting into (0, 100], (101, 200] ... and so on until it
> > finished? Do I get it wrong?
> >
> > Thanks in advance.
> >
> > On Wed, Aug 4, 2021 at 11:02 AM Luke Cwik <[email protected]> wrote:
> >
> > > Your research into the SDF Kafka implementation seems spot on.
> > >
> > > I took a quick look at the links you had provided and for partitioned
> > > topics it looks like you don't have a choice where a Consumer is able
> to
> > > resume from as you have a typical get message and ack scheme client. In
> > > this kind of setup for an initial implementation it is best if you can:
> > > 1) Occasionally poll to see how many messages are still in the queue
> ahead
> > > of you so you can report the remaining work as 1 /
> numberOfInitialSplits *
> > > numOutstandngMessages
> > > *2) Use a pull model for messages (e.g. request N messages and output
> them
> > > all). This prevents an issue where the client library instances
> effectively
> > > are holding onto unprocessed messages while the bundle isn't being
> > > processed.*
> > > 3) Only support checkpointing in the RestrictionTracker (adding support
> > > for dynamic splitting would be great but no runner would exercise it
> right
> > > now in a streaming pipeline)
> > >
> > > It looks like the above would work for both the multi-partition and
> single
> > > partition scenarios and still could parallelize to the capacity of
> what the
> > > brokers could handle. Note that in the future you could still have a
> single
> > > SDF implementation that handles two types of restrictions one being the
> > > Consumer based one and the other being the Reader based one (See
> > > Watch.java[1] for a growing and nongrowing restriction for what I mean
> by
> > > having different branching logic). In the future you would update the
> > > initial splitting logic to check whether the broker has a single
> partition
> > > and then you could create "Reader" restrictions but this would only be
> > > useful if you felt as though there was something to be gained from
> using
> > > it. For the Reader based interface:
> > > 4) Do you expect the user to supply the message id for the first
> message?
> > > (if so is there a way to partition the message id space? (e.g. in
> Kafka the
> > > id is a number that increments and you know where you are and can poll
> for
> > > the latest id so you can split the numerical range easily))
> > > 5) What value do you see it providing?
> > >
> > > 1:
> > >
> https://github.com/apache/beam/blob/03a1cca42ceeec2e963ec14c9bc344956a8683b3/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Watch.java#L885
> > >
> > > On Tue, Aug 3, 2021 at 1:17 PM Marco Robles Pulido <
> > > [email protected]> wrote:
> > >
> > >> Hi folks,
> > >>
> > >> I am working with the new PulsarIO connector with Beam, and most of my
> > >> work has been in researching how Pulsar works, as many of you know we
> > >> already have KafkaIO connector which is kind of similar to Pulsar but
> there
> > >> is some difference that I have found during my research and I would
> like to
> > >> know your input in how would you handle the implementation for SDF.
> Here
> > >> are my main concerns:
> > >> - As you may know kafka handles by default partitioned topics where
> each
> > >> message within the partition gets an incremental id, called offset.
> Having
> > >> this in mind SDF implementation for kafka works something like this,
> where
> > >> the element to evaluate is the topic/partition and the restrictions
> are the
> > >> start and end offsets.
> > >> - For Pulsar, partitioned topics are optional
> > >> <
> https://pulsar.apache.org/docs/en/concepts-messaging/#partitioned-topics>
> or
> > >> well by default are handled by single broker, there is a possibility
> where
> > >> you can use the partitioned topics, but you will limit the final user
> to
> > >> use only partitioned topics with pulsar, as well, there is a
> possibility
> > >> to manually handle cursors
> > >> <
> https://pulsar.apache.org/docs/en/2.5.1/concepts-clients/#reader-interface
> >
> > >> which will be the earliest and latest message available that may be
> used as
> > >> restrictions (but implementing this will not allow to use partitioned
> > >> topics). So with this in mind I was thinking there should be two
> > >> implementations one that use partitioned topics with pulsar and the
> other
> > >> one that manually handle cursors.
> > >>
> > >> So, let me know your ideas/input about it. And maybe If i am wrong
> help
> > >> to clarify the SDF restrictions for KafkaIO.
> > >>
> > >> Thanks,
> > >>
> > >> --
> > >>
> > >> *Marco Robles* *|* WIZELINE
> > >>
> > >> Software Engineer
> > >>
> > >> [email protected]
> > >>
> > >> Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >>
> > >> *This email and its contents (including any attachments) are being
> sent
> > >> toyou on the condition of confidentiality and may be protected by
> > >> legalprivilege. Access to this email by anyone other than the intended
> > >> recipientis unauthorized. If you are not the intended recipient,
> please
> > >> immediatelynotify the sender by replying to this message and delete
> the
> > >> materialimmediately from your system. Any further use, dissemination,
> > >> distributionor reproduction of this email is strictly prohibited.
> Further,
> > >> norepresentation is made with respect to any content contained in this
> > >> email.*
> > >
> > >
> >
> > --
> >
> > *Marco Robles* *|* WIZELINE
> >
> > Software Engineer
> >
> > [email protected]
> >
> > Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.
> >
> > --
> > *This email and its contents (including any attachments) are being sent
> to
> > you on the condition of confidentiality and may be protected by legal
> > privilege. Access to this email by anyone other than the intended
> recipient
> > is unauthorized. If you are not the intended recipient, please
> immediately
> > notify the sender by replying to this message and delete the material
> > immediately from your system. Any further use, dissemination,
> distribution
> > or reproduction of this email is strictly prohibited. Further, no
> > representation is made with respect to any content contained in this
> email.*
> >
>


-- 

*Marco Robles* *|* WIZELINE

Software Engineer

[email protected]

Amado Nervo 2200, Esfera P6, Col. Ciudad del Sol, 45050 Zapopan, Jal.

-- 
*This email and its contents (including any attachments) are being sent to
you on the condition of confidentiality and may be protected by legal
privilege. Access to this email by anyone other than the intended recipient
is unauthorized. If you are not the intended recipient, please immediately
notify the sender by replying to this message and delete the material
immediately from your system. Any further use, dissemination, distribution
or reproduction of this email is strictly prohibited. Further, no
representation is made with respect to any content contained in this email.*

Reply via email to