Github user markap14 commented on the issue:
https://github.com/apache/nifi/pull/2493
Hey @david-streamlio this is very cool! I've been thinking about writing
processors for interacting with Pulsar myself but haven't had a chance yet.
Just a few things that we should think through a bit:
- Re: Connection Pool in controller service vs. doing it in the processor:
what makes sense here I think depends on how you expect to use it. If you
expect to be creating several Pulsar processors with the same connection info,
then a Controller Service makes sense. If you think the more common case will
be a single instance of the Processor then configuring it in the Processor is
probably easier for the user. I think both have their merits though, so I'm
fine with either approach personally.
- One concern that I have is that with the Kafka processors, we end up
having to create a new copy of the processors with pretty much each release of
Kafka, so that we can take advantage of the new features. Have you considered
how you see this evolving as more versions of Pulsar are released? There are
two approaches that we often see with NiFi. One is to create a new processor
for each new version as we did with Kafka. The other is to have a "Client
Service" Controller service. It would then have methods like publish(FlowFile),
consume() or something like that. Then there is only a single ConsumePulsar
processor and a single PublishPulsar processor. Each is then just configured
with the controller service that handles interacting with Pulsar directly.
Either approach is okay, I think. But we should probably think about naming at
least - does it make sense to name these ConsumePulsar_1_20 or
ConsumePulsar_1_0 or something of that nature? I think it's best to figure this
part out
before the initial release because it can then become confusing if we have
processors like ConsumePulsar and ConsumePulsar_1_35 for instance.
Thoughts?
---