James,

Sorry, I let this slip through the cracks. Just took a look at the template 
that you attached.

What you're doing here should certainly work okay with NiFi. However, the 
pattern that you are
using is to take a large XML file, split it into small chunks, process each 
chunk individually, and
then merge the small chunks back into a large chunk. This tends to be quite 
expensive and may
not yield the best performance.

NiFi would actually perform *far* better if you perform your processing without 
splitting that data
apart and then re-joining it. To that end, in NiFi 1.2.0 and 1.3.0 we have 
introduced several new
processors for handling record-oriented data [1] [2]. Unfortunately, we do not 
yet have a Record
Reader for XML data. However, there is a ScriptedRecordReader that could be 
used for scripting
one out, and I've seen one float around somewhere. Perhaps someone else on the 
list is able to
provide something there?

This approach, though, should make the flow much simpler to use and maintain. 
You'd be able to
unzip the data, then use QueryRecord to filter out the records you don't want, 
and then write the
results in JSON. So your flow would probably be as simple as:

GetFile -> UnpackContent -> QueryRecord -> PutFile

Thanks
-Mark

[1] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi
[2] https://blogs.apache.org/nifi/entry/real-time-sql-on-event




On Jun 28, 2017, at 10:12 AM, Stephen Rathnam 
<climbinggiant...@gmail.com<mailto:climbinggiant...@gmail.com>> wrote:

Hi everyone,

I am also starting to do some performance testing of a similar nature. Did
anyone find a good load profile/xml flow to use for this?

Regards,
Stephen

-----------------------------------------------------------------
From: James Farrington <nifini...@gmail.com<mailto:nifini...@gmail.com>>
Date: Mon, Jun 19, 2017 at 11:26 AM
Subject: Example Load Profile
To: dev@nifi.apache.org<mailto:dev@nifi.apache.org>


Hello All,

I am trying to get a performance benchmark for what size load my server can
handle. Currently we are running 3 nodes each with the following
configuration: 4 CPUs, 30.5GB RAM, and 80GB SSD.

Additionally, we are currently using an example XML file from online
(attached).

First off, would anyone have a good idea of a sample load profile that
would give a good baseline of how much my setup can handle? Secondly, would
anyone suggest a different XML file to use for a bench marking test (this
one was just found online).

In my search thus far I have not found any documentation on an example load
profile for a given configuration, so it would perhaps be helpful to others
in the future to have this formally documented. As well as adding a good
sample XML file for getting some benchmark data.

Any advice would be great!

Thanks,
James

Reply via email to