What you describe is a well supported and strong use case for Apache NiFi.
1) You'll want to look into and experiment with the record oriented
processors. Specifically there is a LookupRecord processor that would
probably work well for you. With it you can plugin a lookup service
and if you like you can script or code one to behave exactly as you
wish with regard to caching/etc..
2) With the record oriented processors and the record readers/writers
data is naturally grouped together and kept together in microbatches
(one or more records per flowfile) as per whatever the behavior is of
the given protocols used to source data and then what happens in the
flow logic. For instance, when doing a poll against Kafka numerous
messages are obtained in a single call. We'd have them together in
one flowfile but framed as records.
3) You could leverage this with the scripting processors, custom
processors you might write, or using the Spring context processors.
However, it is possible you wont need/want to as you learn more about
NiFi. Not that those things are good in their own right but rather
you might find NiFi handles the cases you might have used those for
well enough on its own.
On Tue, Mar 6, 2018 at 11:44 PM, Bobby <bobbyhars...@gmail.com> wrote:
> Good day,
> I would like to know the possibilities of NIFI in real time processing;
> I'm thinking of a simple application of messaging,
> 1. A processor will accept messages and do a simple mapping of its
> attributes to database, i.e. if message_body.startsWith("123") then get data
> from table with ID = 123 and add the value as new attribute
> 2. A processor will act as a router
> 3. Two processors will accept act as dispatcher
> In simple illustration,
> |RECEIVER| -- |ROUTER| -- |DISPATCHER A|
> -- |DISPATCHER B|
> Now my questions are:
> 1. What is the best practice / proper way to do database mapping with
> complex criteria?, in my experience, the best way to do this in Java is to
> fetch the data in specific table (let's say Table A) into a collection (Map)
> in regular schedule (per 5 mins or so); The reason i'm doing this is to
> reduce the connection thingy when querying. Imagine if for each message we
> have to open connection, query, close connection. I've tested this approach
> and it was much faster, even with connection pooling. The downside is i need
> space in my memory which grow along with my database size. I use MySQL and
> haven't tested in memory database, i just want to know the common and fast
> approach to do this. If there is no other way, i might create a standalone
> module using spring boot to handle this stuffs and communicating with NIFI
> processor using REST API
> 2. Is it OK to use one flow file for one message?
> 3. Is it possible and recommended to use dependency injection library like
> Thank you,
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/