if they are physically seperate the diff should be quite noticable. On Thu, Sep 13, 2018, 7:36 PM Phil H <gippyp...@gmail.com> wrote:
> Potentially. We're looking to see how the multiple disks help before > committing to spending money on new hardware :) > > On Fri, 14 Sep 2018 at 10:48, Joe Witt <joe.w...@gmail.com> wrote: > > > phil, > > > > as you add dirs it will just start using them. if you want to no longer > > use the current dir it might be more involved. > > > > does that help? > > > > thanks > > > > On Thu, Sep 13, 2018, 4:36 PM Phil H <gippyp...@gmail.com> wrote: > > > > > Follow up question - how do I transition to this new structure? Should > I > > > shut down NiFi and move the contents of the legacy single directories > > into > > > one of the new ones? For example: > > > > > > mv /usr/nifi/content_repository > > > /nifi/repos/content-1 > > > > > > TIA > > > Phil > > > > > > > > > On Wed, 12 Sep 2018 at 06:15, Mark Payne <marka...@hotmail.com> wrote: > > > > > > > Phil, > > > > > > > > For the content repository, you can configure the directory by > changing > > > > the value of > > > > the "nifi.content.repository.directory.default" property in > > > > nifi.properties. The suffix here, > > > > "default" is the name of this "container". You can have multiple > > > > containers by adding extra > > > > properties. So, for example, you could set: > > > > > > > > nifi.content.repository.directory.content1= > > > > /nifi/repos/content-1 > > > > > > > > nifi.content.repository.directory.content2=/nifi/repos/content-2 > > > > nifi.content.repository.directory.content3=/nifi/repos/content-3 > > > > nifi.content.repository.directory.content4=/nifi/repos/content-4 > > > > > > > > Similarly, the Provenance Repo property is named > > > > "nifi.provenance.repository.directory.default" > > > > and can have any number of "containers": > > > > > > > > nifi.provenance.repository.directory.prov1=/nifi/repos/prov-1 > > > > nifi.provenance.repository.directory.prov2=/nifi/repos/prov-2 > > > > nifi.provenance.repository.directory.prov3=/nifi/repos/prov-3 > > > > nifi.provenance.repository.directory.prov4=/nifi/repos/prov-4 > > > > > > > > When NiFi writes to these, it does a Round Robin so that if you're > > > writing > > > > to 4 Flow Files' > > > > content simultaneously with different threads, you're able to get the > > > full > > > > throughput of each > > > > disk. (So if you have 4 disks for your content repo, each capable of > > > > writing 100 MB/sec, then > > > > your effective write rate to the content repo is 400 MB/sec). Similar > > > with > > > > Provenance Repository. > > > > > > > > Doing this also will allow you to hold a larger 'archive' of content > > and > > > > provenance data, because > > > > it will span the archive across all of the listed directories, as > well. > > > > > > > > Thanks > > > > -Mark > > > > > > > > > > > > > > > > > On Sep 11, 2018, at 3:35 PM, Phil H <gippyp...@gmail.com> wrote: > > > > > > > > > > Thanks Mark, this is great advice. > > > > > > > > > > Disk access is certainly an issue with the current set up. I will > > > > certainly > > > > > shoot for NVMe disks in the build. How does NiFi get configured to > > span > > > > > it's repositories across multiple physical disks? > > > > > > > > > > Thanks, > > > > > Phil > > > > > > > > > > On Wed, 12 Sep 2018 at 01:32, Mark Payne <marka...@hotmail.com> > > wrote: > > > > > > > > > >> Phil, > > > > >> > > > > >> As Sivaprasanna mentioned, your bottleneck will certainly depend > on > > > your > > > > >> flow. > > > > >> There's nothing inherent about NiFi or the JVM, AFAIK that would > > limit > > > > >> you. I've > > > > >> seen NiFi run on VM's containing 4-8 cores, and I've seen it run > on > > > bare > > > > >> metal > > > > >> on servers containing 96+ cores. Most often, I see people with a > lot > > > of > > > > >> CPU cores > > > > >> but insufficient disk, so if you're running several cores ensure > > that > > > > >> you're using > > > > >> SSD's / NVMe's or enough spinning disks to accommodate the flow. > > NiFi > > > > does > > > > >> a good > > > > >> job of spanning the content and FlowFile repositories across > > multiple > > > > >> disks to take > > > > >> full advantage of the hardware, and scales the CPU vertically by > way > > > of > > > > >> multiple > > > > >> Processors and multiple concurrent tasks (threads) on a given > > > Processor. > > > > >> > > > > >> It really comes down to what you're doing in your flow, though. If > > > > you've > > > > >> got 96 cores and > > > > >> you're trying to perform 5 dozen transformations against a large > > > number > > > > of > > > > >> FlowFiles > > > > >> but have only a single spinning disk, then those 96 cores will > > likely > > > go > > > > >> to waste, because > > > > >> your disk will bottleneck you. > > > > >> > > > > >> Likewise, if you have 10 SSD's and only 8 cores you're likely > going > > to > > > > >> waste a lot of > > > > >> disk because you won't have the CPU needed to reach the disks' > full > > > > >> potential. > > > > >> So you'll need to strike the correct balance for your use > case.Since > > > you > > > > >> have the > > > > >> flow running right now, I would recommend looking at things like > > `top` > > > > and > > > > >> `iostat` in order > > > > >> to understand if you're reaching your limit on CPU, disk, etc. > > > > >> > > > > >> As far as RAM is concerned, NiFI typically only needs 4-8 GB of > ram > > > for > > > > >> the heap. However, > > > > >> more RAM means that your operating system can make better use of > > disk > > > > >> caching, which > > > > >> can certainly speed things up, especially if you're reading the > > > content > > > > >> several times for > > > > >> each FlowFile. > > > > >> > > > > >> Does this help at all? > > > > >> > > > > >> Thanks > > > > >> -Mark > > > > >> > > > > >> > > > > >>> On Sep 10, 2018, at 6:05 AM, Phil H <gippyp...@gmail.com> wrote: > > > > >>> > > > > >>> Thanks for that. Sorry I should have been more specific - we > have a > > > > flow > > > > >>> running already on non-dedicated hardware. Looking to identify > any > > > > >>> limitations in NiFi/JVM that would limit how much parallelism it > > can > > > > take > > > > >>> advantage of > > > > >>> > > > > >>> On Mon, 10 Sep 2018 at 14:32, Sivaprasanna < > > > sivaprasanna...@gmail.com> > > > > >>> wrote: > > > > >>> > > > > >>>> Phil, > > > > >>>> > > > > >>>> The hardware requirements are driven by the nature of the > dataflow > > > you > > > > >> are > > > > >>>> developing. If you're looking to play around with NiFi and gain > > some > > > > >>>> hands-on experience, go for a 4 core 8GB RAM i.e. any modern > > > > >>>> laptops/computer would do the job. In my case, where I'm having > > 100s > > > > of > > > > >>>> dataflows, I have it clustered with 3 nodes. Each having 16GB > RAM > > > and > > > > >> 4(8) > > > > >>>> cores. I went with SSDs of smaller size because my flows are > > > involved > > > > in > > > > >>>> writing to object stores like Google Cloud Storage, Azure Blob > and > > > > >> Amazon > > > > >>>> S3 and NoSQL DBs. Hope this helps. > > > > >>>> > > > > >>>> - > > > > >>>> Sivaprasanna > > > > >>>> > > > > >>>> On Mon, Sep 10, 2018 at 4:09 AM Phil H <gippyp...@gmail.com> > > wrote: > > > > >>>> > > > > >>>>> Hi all, > > > > >>>>> > > > > >>>>> I've been asked to spec some hardware for a NiFi installation. > > Does > > > > >>>> anyone > > > > >>>>> have any advice? My gut feel is lots of processor cores and > RAM, > > > with > > > > >>>> less > > > > >>>>> emphasis on storage (small fast disks). Are there any > limitations > > > on > > > > >> how > > > > >>>>> many cores the JRE/NiFi can actually make use of, or any other > > > > >>>>> considerations like that I should be aware of? > > > > >>>>> > > > > >>>>> Most likely will be pairs of servers in a cluster, but again > any > > > > advice > > > > >>>> to > > > > >>>>> the contrary would be appreciated. > > > > >>>>> > > > > >>>>> Cheers, > > > > >>>>> Phil > > > > >>>>> > > > > >>>> > > > > >> > > > > >> > > > > > > > > > > > > > >