James, Sorry, I let this slip through the cracks. Just took a look at the template that you attached.
What you're doing here should certainly work okay with NiFi. However, the pattern that you are using is to take a large XML file, split it into small chunks, process each chunk individually, and then merge the small chunks back into a large chunk. This tends to be quite expensive and may not yield the best performance. NiFi would actually perform *far* better if you perform your processing without splitting that data apart and then re-joining it. To that end, in NiFi 1.2.0 and 1.3.0 we have introduced several new processors for handling record-oriented data [1] [2]. Unfortunately, we do not yet have a Record Reader for XML data. However, there is a ScriptedRecordReader that could be used for scripting one out, and I've seen one float around somewhere. Perhaps someone else on the list is able to provide something there? This approach, though, should make the flow much simpler to use and maintain. You'd be able to unzip the data, then use QueryRecord to filter out the records you don't want, and then write the results in JSON. So your flow would probably be as simple as: GetFile -> UnpackContent -> QueryRecord -> PutFile Thanks -Mark [1] https://blogs.apache.org/nifi/entry/record-oriented-data-with-nifi [2] https://blogs.apache.org/nifi/entry/real-time-sql-on-event On Jun 28, 2017, at 10:12 AM, Stephen Rathnam <climbinggiant...@gmail.com<mailto:climbinggiant...@gmail.com>> wrote: Hi everyone, I am also starting to do some performance testing of a similar nature. Did anyone find a good load profile/xml flow to use for this? Regards, Stephen ----------------------------------------------------------------- From: James Farrington <nifini...@gmail.com<mailto:nifini...@gmail.com>> Date: Mon, Jun 19, 2017 at 11:26 AM Subject: Example Load Profile To: dev@nifi.apache.org<mailto:dev@nifi.apache.org> Hello All, I am trying to get a performance benchmark for what size load my server can handle. Currently we are running 3 nodes each with the following configuration: 4 CPUs, 30.5GB RAM, and 80GB SSD. Additionally, we are currently using an example XML file from online (attached). First off, would anyone have a good idea of a sample load profile that would give a good baseline of how much my setup can handle? Secondly, would anyone suggest a different XML file to use for a bench marking test (this one was just found online). In my search thus far I have not found any documentation on an example load profile for a given configuration, so it would perhaps be helpful to others in the future to have this formally documented. As well as adding a good sample XML file for getting some benchmark data. Any advice would be great! Thanks, James