Slack digest for #dev - 2019-07-27

Apache Pulsar Slack Sat, 27 Jul 2019 02:11:17 -0700
2019-07-26 16:28:59 UTC - alphazero: Hi team. I have a general question 
regarding quality of service of `sink` `connectors`. Specifically in context of 
(remote) host failures, back pressure, retries, etc. Hugely appreciate insights 
and any relevant links.
----
2019-07-26 16:33:39 UTC - David Kjerrumgaard: @alphazero It depends a bit on 
how much data buffering that is done inside the sink itself, e.g. do you batch 
up messages to do an bulk insert, etc.  However, in general if the sink fails 
the messages, then they will be retained in the source topic so no data will be 
lost. Once the sink's subscription has built up a sufficient backlog the sink 
will stop consuming messages until the underlying issue is resolved.
----
2019-07-26 16:36:50 UTC - alphazero: thanks @David Kjerrumgaard. background: 
We're in the initial exploratory phase so open to `best practices` suggestions. 
Our potential end-points are both `http` based connection-less, and std 
`protocol` e.g. `amqp`. We were planning on using the built-in `sinks` for the 
standard end-points.
----
2019-07-26 16:38:30 UTC - alphazero: The actual remote system sinks are 3rd 
party systems not managed by us at all. We control the `source` (which happens 
to be `RabbitMQ`)
----
2019-07-26 16:39:23 UTC - alphazero: So our setup (pending full migration to 
Pulsar if it shines as we expect it to) is `amqp` -&gt; `pulsar-connectors` 
-&gt; `3rd parties`
----
2019-07-26 16:40:15 UTC - alphazero: At some future date we plan on removing 
the rabbits and using Pulsar only.
----
2019-07-26 16:50:29 UTC - David Kjerrumgaard: @alphazero The standard built-in 
sinks should perform as outlined above. When interacting with 3rd party 
systems, system availability is out of your control, so the best you can do is 
to identify and react to that scenario in a reasonable fashion that ensures you 
don't lose data. Fortunately, you are using the correct framework for that, as 
those capabilities are already built into Pulsar.
----
2019-07-26 16:52:04 UTC - David Kjerrumgaard: @alphazero From a best practices 
perspective,  if you are writing your own sink in the future, and wish to batch 
up messages before sending them to the external system, then you will want to 
ensure that you only ack the messages AFTER you have successfully published the 
messages as indicated by a success response from the downstream system
----
2019-07-26 16:54:57 UTC - alphazero: thank you @David Kjerrumgaard. so correct 
to assume that intermittent failures (connection drops) etc. are transparently 
handled by built-in connectors and our main responsibility is flow monitoring 
in case of backlogs?
----
2019-07-26 16:56:40 UTC - David Kjerrumgaard: @alphazero Correct, but if you 
encounter different behavior please file a JIRA, etc.  :smiley:
----
2019-07-26 16:56:52 UTC - alphazero: LOL :slightly_smiling_face: will do.
----
2019-07-26 16:58:17 UTC - alphazero: one last q @David Kjerrumgaard. Is it a 
bad practice to retain 'state` in these connectors?
----
2019-07-26 17:07:00 UTC - David Kjerrumgaard: @alphazero First off, you should 
definitely NOT retain state inside the connector itself, i.e as a local or 
static variable. Since they are ephemeral for one thing, and there could be 
multiple instances for another reason. It would be better to use the state 
capabilities provided by the Pulsar Functions State API.
+1 : alphazero
----
2019-07-26 17:08:13 UTC - alphazero: Understood. And connection life-cycle. 
It's a bit of mystery to me how the SDK detects a dropped connection and 
re-instantiate the connector. Should I assume this is transparently handled by 
Puslar?
----
2019-07-26 17:16:34 UTC - David Kjerrumgaard: @alphazero A dropped connection 
to the external system should be handled in a try/catch block inside the 
connectors itself. In the event of an exception, you can react accordingly, 
e.g. attempt to re-establish a connection, etc.  However, the most important 
thing to do is to ensure that the message(s) are `failed` in such a scenario.  
This ensures they will be retained and replayed if/when the external system 
comes back online.
+1 : alphazero
----
2019-07-26 17:18:45 UTC - alphazero: Thank you @David Kjerrumgaard for all your 
input. Very helpful. /out
----
2019-07-26 17:19:44 UTC - David Kjerrumgaard: The above scenario will not cause 
the connector to be stopped/restarted, etc. It is performing properly and just 
failing incoming messages and will continue to do so until the problem is 
fixed. At some point backlog quotas and message TTL comes into play, so you 
will need to adjust those on the source topic accordingly
+1 : alphazero
----
2019-07-26 17:21:30 UTC - David Kjerrumgaard: for production environments, I 
also suggest using the long weekend rule, i.e prepare to handle a scenario 
where the situation persists for the entire duration of a holiday  weekend when 
your team is away and no one can address the issue until they return from the 
long weekend.   :smiley:
----
2019-07-26 17:27:19 UTC - alphazero: yep, thanks. The picture is much clearer 
now. The retention of this backlog is a domain issue that is frankly a can of 
worm on its own.
+1 : David Kjerrumgaard
----
Slack digest for #dev - 2019-07-27

Reply via email to