No, in my scenario a server restart would not affect the content repository
size.

On Sun, Dec 11, 2016 at 8:46 AM, Alan Jackoway <[email protected]> wrote:

> If we were in the situation Joe G described, should we expect that when we
> kill and restart nifi it would clean everything up? That behavior has been
> consistent every time - when the disk hits 100%, we kill nifi, delete
> enough old content files to bring it back up, and before it bring the UI up
> it deletes things to get within the archive policy again. That sounds less
> like the files are stuck and more like it failed trying.
>
> For now I just turned off archiving, since we don't really need it for
> this use case.
>
> I attached a jstack from last night's failure, which looks pretty boring
> to me.
>
> On Sun, Dec 11, 2016 at 1:37 AM, Alan Jackoway <[email protected]> wrote:
>
>> The scenario Joe G describes is almost exactly what we are doing. We
>> bring in large files and unpack them into many smaller ones. In the most
>> recent iteration of this problem, I saw that we had many small files queued
>> up at the time trouble was happening. We will try your suggestion to see if
>> the situation improves.
>>
>> Thanks,
>> Alan
>>
>> On Sat, Dec 10, 2016 at 6:57 AM, Joe Gresock <[email protected]> wrote:
>>
>>> Not sure if your scenario is related, but one of the NiFi devs recently
>>> explained to me that the files in the content repository are actually
>>> appended together with other flow file content (please correct me if I'm
>>> explaining it wrong).  That means if you have many small flow files in
>>> your
>>> current backlog, and several large flow files have recently left the
>>> flow,
>>> the large ones could still be hanging around in the content repository as
>>> long as the small ones are still there, if they're in the same appended
>>> files on disk.
>>>
>>> This scenario recently happened to us: we had a flow with ~20 million
>>> tiny
>>> flow files queued up, and at the same time we were also processing a
>>> bunch
>>> of 1GB files, which left the flow quickly.  The content repository was
>>> much
>>> larger than what was actually being reported in the flow stats, and our
>>> disks were almost full.  On a hunch, I tried the following strategy:
>>> - MergeContent the tiny flow files using flow-file-v3 format (to capture
>>> all attributes)
>>> - MergeContent 10,000 of the packaged flow files using tar format for
>>> easier storage on disk
>>> - PutFile into a directory
>>> - GetFile from the same directory, but using back pressure from here on
>>> out
>>> (so that the flow simply wouldn't pull the same files from disk until it
>>> was really ready for them)
>>> - UnpackContent (untar them)
>>> - UnpackContent (turn them back into flow files with the original
>>> attributes)
>>> - Then do the processing they were originally designed for
>>>
>>> This had the effect of very quickly reducing the size of my content
>>> repository to very nearly the actual size I saw reported in the flow, and
>>> my disk usage dropped from ~95% to 50%, which is the configured content
>>> repository max usage percentage.  I haven't had any problems since.
>>>
>>> Hope this helps.
>>> Joe
>>>
>>> On Sat, Dec 10, 2016 at 12:04 AM, Joe Witt <[email protected]> wrote:
>>>
>>> > Alan,
>>> >
>>> > That retention percentage only has to do with the archive of data
>>> > which kicks in once a given chunk of content is no longer reachable by
>>> > active flowfiles in the flow.  For it to grow to 100% typically would
>>> > mean that you have data backlogged in the flow that account for that
>>> > much space.  If that is certainly not the case for you then we need to
>>> > dig deeper.  If you could do screenshots or share log files and stack
>>> > dumps around this time those would all be helpful.  If the screenshots
>>> > and such are too sensitive please just share as much as you can.
>>> >
>>> > Thanks
>>> > Joe
>>> >
>>> > On Fri, Dec 9, 2016 at 9:55 PM, Alan Jackoway <[email protected]>
>>> wrote:
>>> > > One other note on this, when it came back up there were tons of
>>> messages
>>> > > like this:
>>> > >
>>> > > 2016-12-09 18:36:36,244 INFO [main] o.a.n.c.repository.
>>> > FileSystemRepository
>>> > > Found unknown file /path/to/content_repository/49
>>> 8/1481329796415-87538
>>> > > (1071114 bytes) in File System Repository; archiving file
>>> > >
>>> > > I haven't dug into what that means.
>>> > > Alan
>>> > >
>>> > > On Fri, Dec 9, 2016 at 9:53 PM, Alan Jackoway <[email protected]>
>>> > wrote:
>>> > >
>>> > >> Hello,
>>> > >>
>>> > >> We have a node on which nifi content repository keeps growing to use
>>> > 100%
>>> > >> of the disk. It's a relatively high-volume process. It chewed
>>> through
>>> > more
>>> > >> than 100GB in the three hours between when we first saw it hit 100%
>>> of
>>> > the
>>> > >> disk and when we just cleaned it up again.
>>> > >>
>>> > >> We are running nifi 1.1 for this. Our nifi.properties looked like
>>> this:
>>> > >>
>>> > >> nifi.content.repository.implementation=org.apache.
>>> > >> nifi.controller.repository.FileSystemRepository
>>> > >> nifi.content.claim.max.appendable.size=10 MB
>>> > >> nifi.content.claim.max.flow.files=100
>>> > >> nifi.content.repository.directory.default=./content_repository
>>> > >> nifi.content.repository.archive.max.retention.period=12 hours
>>> > >> nifi.content.repository.archive.max.usage.percentage=50%
>>> > >> nifi.content.repository.archive.enabled=true
>>> > >> nifi.content.repository.always.sync=false
>>> > >>
>>> > >> I just bumped retention period down to 2 hours, but should max usage
>>> > >> percentage protect us from using 100% of the disk?
>>> > >>
>>> > >> Unfortunately we didn't get jstacks on either failure. If it hits
>>> 100%
>>> > >> again I will make sure to get that.
>>> > >>
>>> > >> Thanks,
>>> > >> Alan
>>> > >>
>>> >
>>>
>>>
>>>
>>> --
>>> I know what it is to be in need, and I know what it is to have plenty.  I
>>> have learned the secret of being content in any and every situation,
>>> whether well fed or hungry, whether living in plenty or in want.  I can
>>> do
>>> all this through him who gives me strength.    *-Philippians 4:12-13*
>>>
>>
>>
>


-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Reply via email to