[GitHub] nifi issue #2493: Added Pulsar processors and Controller Service

markap14 Wed, 28 Feb 2018 11:08:48 -0800

Github user markap14 commented on the issue:

    https://github.com/apache/nifi/pull/2493
  
    @david-streamlio the idea behind the record-oriented processors is that 
they make building your flow in NiFi much easier. Without that, you end up 
having to split your data up into tons of FlowFiles and then push each one 
individually to Pulsar. So you would have to use a SplitText, SplitJson, 
SplitAvro, etc. type of processor if the data is already 'batched together.' 
But if we had a PublishPulsarRecord, you can skip having to split the data up. 
It turns out that splitting the data up becomes quite expensive because instead 
of a single FlowFile containing 10,000 records you now have 10,000 FlowFiles, 
each with their own attributes, their own Provenance events, etc. So the 
record-oriented processors allow the flow to be much more efficient, and they 
also allow easy conversion, validation, etc. so the flows are also easier to 
build and maintain. So even without a schema registry integrated into Pulsar, 
the record-oriented approach is very helpful for the users building the flow.

---

[GitHub] nifi issue #2493: Added Pulsar processors and Controller Service

Reply via email to