The scenario Joe G describes is almost exactly what we are doing. We bring
in large files and unpack them into many smaller ones. In the most recent
iteration of this problem, I saw that we had many small files queued up at
the time trouble was happening. We will try your suggestion to see if the
situation improves.

Thanks,
Alan

On Sat, Dec 10, 2016 at 6:57 AM, Joe Gresock <jgres...@gmail.com> wrote:

> Not sure if your scenario is related, but one of the NiFi devs recently
> explained to me that the files in the content repository are actually
> appended together with other flow file content (please correct me if I'm
> explaining it wrong).  That means if you have many small flow files in your
> current backlog, and several large flow files have recently left the flow,
> the large ones could still be hanging around in the content repository as
> long as the small ones are still there, if they're in the same appended
> files on disk.
>
> This scenario recently happened to us: we had a flow with ~20 million tiny
> flow files queued up, and at the same time we were also processing a bunch
> of 1GB files, which left the flow quickly.  The content repository was much
> larger than what was actually being reported in the flow stats, and our
> disks were almost full.  On a hunch, I tried the following strategy:
> - MergeContent the tiny flow files using flow-file-v3 format (to capture
> all attributes)
> - MergeContent 10,000 of the packaged flow files using tar format for
> easier storage on disk
> - PutFile into a directory
> - GetFile from the same directory, but using back pressure from here on out
> (so that the flow simply wouldn't pull the same files from disk until it
> was really ready for them)
> - UnpackContent (untar them)
> - UnpackContent (turn them back into flow files with the original
> attributes)
> - Then do the processing they were originally designed for
>
> This had the effect of very quickly reducing the size of my content
> repository to very nearly the actual size I saw reported in the flow, and
> my disk usage dropped from ~95% to 50%, which is the configured content
> repository max usage percentage.  I haven't had any problems since.
>
> Hope this helps.
> Joe
>
> On Sat, Dec 10, 2016 at 12:04 AM, Joe Witt <joe.w...@gmail.com> wrote:
>
> > Alan,
> >
> > That retention percentage only has to do with the archive of data
> > which kicks in once a given chunk of content is no longer reachable by
> > active flowfiles in the flow.  For it to grow to 100% typically would
> > mean that you have data backlogged in the flow that account for that
> > much space.  If that is certainly not the case for you then we need to
> > dig deeper.  If you could do screenshots or share log files and stack
> > dumps around this time those would all be helpful.  If the screenshots
> > and such are too sensitive please just share as much as you can.
> >
> > Thanks
> > Joe
> >
> > On Fri, Dec 9, 2016 at 9:55 PM, Alan Jackoway <al...@cloudera.com>
> wrote:
> > > One other note on this, when it came back up there were tons of
> messages
> > > like this:
> > >
> > > 2016-12-09 18:36:36,244 INFO [main] o.a.n.c.repository.
> > FileSystemRepository
> > > Found unknown file /path/to/content_repository/498/1481329796415-87538
> > > (1071114 bytes) in File System Repository; archiving file
> > >
> > > I haven't dug into what that means.
> > > Alan
> > >
> > > On Fri, Dec 9, 2016 at 9:53 PM, Alan Jackoway <al...@cloudera.com>
> > wrote:
> > >
> > >> Hello,
> > >>
> > >> We have a node on which nifi content repository keeps growing to use
> > 100%
> > >> of the disk. It's a relatively high-volume process. It chewed through
> > more
> > >> than 100GB in the three hours between when we first saw it hit 100% of
> > the
> > >> disk and when we just cleaned it up again.
> > >>
> > >> We are running nifi 1.1 for this. Our nifi.properties looked like
> this:
> > >>
> > >> nifi.content.repository.implementation=org.apache.
> > >> nifi.controller.repository.FileSystemRepository
> > >> nifi.content.claim.max.appendable.size=10 MB
> > >> nifi.content.claim.max.flow.files=100
> > >> nifi.content.repository.directory.default=./content_repository
> > >> nifi.content.repository.archive.max.retention.period=12 hours
> > >> nifi.content.repository.archive.max.usage.percentage=50%
> > >> nifi.content.repository.archive.enabled=true
> > >> nifi.content.repository.always.sync=false
> > >>
> > >> I just bumped retention period down to 2 hours, but should max usage
> > >> percentage protect us from using 100% of the disk?
> > >>
> > >> Unfortunately we didn't get jstacks on either failure. If it hits 100%
> > >> again I will make sure to get that.
> > >>
> > >> Thanks,
> > >> Alan
> > >>
> >
>
>
>
> --
> I know what it is to be in need, and I know what it is to have plenty.  I
> have learned the secret of being content in any and every situation,
> whether well fed or hungry, whether living in plenty or in want.  I can do
> all this through him who gives me strength.    *-Philippians 4:12-13*
>

Reply via email to