Re: Custom structured streaming source for a system with non-repeatable reads?

Mich Talebzadeh Tue, 01 Feb 2022 09:33:52 -0800

Hi,

It is not very clear to me what you mean and I quote "possible to implement
a structured streaming source fo Pub/Sub"


Pub/Sub is a messaging queue best akin to RabbitMQ, distinct from Kafka. It
would be interesting what this proposal of yours is going to achieve? On
the face of it you are trying to make Pub/Sub behave like SSS. If that is
the case, then will Pub/Sub still be required to pass the topics?


HTH



   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 1 Feb 2022 at 15:54, Daniel Collins <dpcoll...@google.com.invalid>
wrote:

> Hello spark developers,
>
> I'm trying to figure out if it would be possible to implement a structured
> streaming source for Google Pub/Sub <https://cloud.google.com/pubsub>,
> however, the continuous reader programming model has some impedance with
> this system. Google Pub/Sub is not a partitioned system by nature, and does
> not have an offset concept, instead using per message acknowledgement IDs
> (ackIDs) to track subscriber progress. However, I think it should be
> possible to work around this impedance and implement a structured streaming
> source as follows:
>
> - Create some large fixed number (possibly configurable) of input
> partitions with no semantic meaning (Pub/Sub is not a partitioned system)
> - Create an Offset which encapsulates the set of read ackIDs to propagate
> from the ContinuousInputPartitionReader back to the ContinuousReader
> -In the commit() method, acknowledge all the ids that are in the set with
> that offset
> -Return true from `needsReconfiguration` every minute or so to prevent
> ContinuousInputPartitionReaders from accumulating too many ackIDs in their
> local state
>
> Does this sound like a feasible way to implement a source for this system?
> Can anyone anticipate any issues with this approach? Are there any other
> approaches which might solve this problem better?
>
> Looking forward to your responses!
>
> Thank you,
>
> Daniel
>
> P.S. If you've received this already, apologies for the spam, I can't see
> it in the archives and only recently successfully subscribed to the mailing
> list.
>

Re: Custom structured streaming source for a system with non-repeatable reads?

Reply via email to