No, in my scenario a server restart would not affect the content repository size.
On Sun, Dec 11, 2016 at 8:46 AM, Alan Jackoway <[email protected]> wrote: > If we were in the situation Joe G described, should we expect that when we > kill and restart nifi it would clean everything up? That behavior has been > consistent every time - when the disk hits 100%, we kill nifi, delete > enough old content files to bring it back up, and before it bring the UI up > it deletes things to get within the archive policy again. That sounds less > like the files are stuck and more like it failed trying. > > For now I just turned off archiving, since we don't really need it for > this use case. > > I attached a jstack from last night's failure, which looks pretty boring > to me. > > On Sun, Dec 11, 2016 at 1:37 AM, Alan Jackoway <[email protected]> wrote: > >> The scenario Joe G describes is almost exactly what we are doing. We >> bring in large files and unpack them into many smaller ones. In the most >> recent iteration of this problem, I saw that we had many small files queued >> up at the time trouble was happening. We will try your suggestion to see if >> the situation improves. >> >> Thanks, >> Alan >> >> On Sat, Dec 10, 2016 at 6:57 AM, Joe Gresock <[email protected]> wrote: >> >>> Not sure if your scenario is related, but one of the NiFi devs recently >>> explained to me that the files in the content repository are actually >>> appended together with other flow file content (please correct me if I'm >>> explaining it wrong). That means if you have many small flow files in >>> your >>> current backlog, and several large flow files have recently left the >>> flow, >>> the large ones could still be hanging around in the content repository as >>> long as the small ones are still there, if they're in the same appended >>> files on disk. >>> >>> This scenario recently happened to us: we had a flow with ~20 million >>> tiny >>> flow files queued up, and at the same time we were also processing a >>> bunch >>> of 1GB files, which left the flow quickly. The content repository was >>> much >>> larger than what was actually being reported in the flow stats, and our >>> disks were almost full. On a hunch, I tried the following strategy: >>> - MergeContent the tiny flow files using flow-file-v3 format (to capture >>> all attributes) >>> - MergeContent 10,000 of the packaged flow files using tar format for >>> easier storage on disk >>> - PutFile into a directory >>> - GetFile from the same directory, but using back pressure from here on >>> out >>> (so that the flow simply wouldn't pull the same files from disk until it >>> was really ready for them) >>> - UnpackContent (untar them) >>> - UnpackContent (turn them back into flow files with the original >>> attributes) >>> - Then do the processing they were originally designed for >>> >>> This had the effect of very quickly reducing the size of my content >>> repository to very nearly the actual size I saw reported in the flow, and >>> my disk usage dropped from ~95% to 50%, which is the configured content >>> repository max usage percentage. I haven't had any problems since. >>> >>> Hope this helps. >>> Joe >>> >>> On Sat, Dec 10, 2016 at 12:04 AM, Joe Witt <[email protected]> wrote: >>> >>> > Alan, >>> > >>> > That retention percentage only has to do with the archive of data >>> > which kicks in once a given chunk of content is no longer reachable by >>> > active flowfiles in the flow. For it to grow to 100% typically would >>> > mean that you have data backlogged in the flow that account for that >>> > much space. If that is certainly not the case for you then we need to >>> > dig deeper. If you could do screenshots or share log files and stack >>> > dumps around this time those would all be helpful. If the screenshots >>> > and such are too sensitive please just share as much as you can. >>> > >>> > Thanks >>> > Joe >>> > >>> > On Fri, Dec 9, 2016 at 9:55 PM, Alan Jackoway <[email protected]> >>> wrote: >>> > > One other note on this, when it came back up there were tons of >>> messages >>> > > like this: >>> > > >>> > > 2016-12-09 18:36:36,244 INFO [main] o.a.n.c.repository. >>> > FileSystemRepository >>> > > Found unknown file /path/to/content_repository/49 >>> 8/1481329796415-87538 >>> > > (1071114 bytes) in File System Repository; archiving file >>> > > >>> > > I haven't dug into what that means. >>> > > Alan >>> > > >>> > > On Fri, Dec 9, 2016 at 9:53 PM, Alan Jackoway <[email protected]> >>> > wrote: >>> > > >>> > >> Hello, >>> > >> >>> > >> We have a node on which nifi content repository keeps growing to use >>> > 100% >>> > >> of the disk. It's a relatively high-volume process. It chewed >>> through >>> > more >>> > >> than 100GB in the three hours between when we first saw it hit 100% >>> of >>> > the >>> > >> disk and when we just cleaned it up again. >>> > >> >>> > >> We are running nifi 1.1 for this. Our nifi.properties looked like >>> this: >>> > >> >>> > >> nifi.content.repository.implementation=org.apache. >>> > >> nifi.controller.repository.FileSystemRepository >>> > >> nifi.content.claim.max.appendable.size=10 MB >>> > >> nifi.content.claim.max.flow.files=100 >>> > >> nifi.content.repository.directory.default=./content_repository >>> > >> nifi.content.repository.archive.max.retention.period=12 hours >>> > >> nifi.content.repository.archive.max.usage.percentage=50% >>> > >> nifi.content.repository.archive.enabled=true >>> > >> nifi.content.repository.always.sync=false >>> > >> >>> > >> I just bumped retention period down to 2 hours, but should max usage >>> > >> percentage protect us from using 100% of the disk? >>> > >> >>> > >> Unfortunately we didn't get jstacks on either failure. If it hits >>> 100% >>> > >> again I will make sure to get that. >>> > >> >>> > >> Thanks, >>> > >> Alan >>> > >> >>> > >>> >>> >>> >>> -- >>> I know what it is to be in need, and I know what it is to have plenty. I >>> have learned the secret of being content in any and every situation, >>> whether well fed or hungry, whether living in plenty or in want. I can >>> do >>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >> >> > -- I know what it is to be in need, and I know what it is to have plenty. I have learned the secret of being content in any and every situation, whether well fed or hungry, whether living in plenty or in want. I can do all this through him who gives me strength. *-Philippians 4:12-13*
