Joe, That definitely sounds like a bug causing the eviction to not happen. Can you grep your logs for the phrase "checkpointed with"? You should have a line that tells you how many records were written to the Snapshot. You will certainly see a few of these types of messages, though, because you have 1 for the FlowFile Repository, one for Local State Management, and another one for the DistributedMapCacheServer. I am curious to see if you see the log message indicating 3 million+ records also.
Thanks -Mark > On Mar 8, 2017, at 7:13 PM, Joe Gresock <[email protected]> wrote: > > Looking through the PersistenceMapCache and SimpleMapCache, it seems like > lots of these records should have been evicted by now. We're up to 3.1 > million records on disk in the snapshot file. My understanding is that > when wali.checkpoint() is called, it collapses all the DELETE records in > the journaled log and removes them before writing the snapshot file. Is > that accurate? > > I feel like something is not going quite right with the eviction process. > I am using 1.1.1, though, and I have noticed that the PersistentMapCache > has changed in [1], so I might apply that patch and try some more > experiments. > > Would anyone be willing to try to replicate this behavior in NiFi 1.1.1? > You should be able to do it as follows: > Services: > DistributedMapCacheServer, maximum cache entries = 100,000, FIFO eviction, > persistence directory specified > DistributedMapCacheClientService, point to the same host and port > > Flow: > GenerateFlowFile (randomize 1K binary files in batches of 10, schedule 10 > threads) ->HashContent (md5) into hash.value -> DetectDuplicate with > identifier = ${hash.value}, description = ., no age off, select your cache > client, cache identifier = true > > This should cause the snapshot file to exceed 100,000 keys pretty quickly, > and as far as I can tell, it never goes back down. This in itself is not a > problem, but when the cache gets really big, it tends to crash our cluster > when NiFi reloads it into memory. > > [1] https://issues.apache.org/jira/browse/NIFI-3214 > > > On Wed, Mar 8, 2017 at 11:06 AM, Joe Gresock <[email protected]> wrote: > >> Thanks Bryan, I'll start looking through the PersistenceMapCache. This >> morning I checked back and the snapshot file now has 2.9 million keys in it. >> >> On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote: >> >>> Joe, >>> >>> I'm not that familiar with the persistence part of the DMCS, although >>> I do know that it uses the write-ahead-log that is also used by the >>> flow file repo. >>> >>> The code for PersistenceMapCache is here: >>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/ >>> nifi-standard-services/nifi-distributed-cache-services- >>> bundle/nifi-distributed-cache-server/src/main/java/org/ >>> apache/nifi/distributed/cache/server/map/PersistentMapCache.java >>> >>> It looks like the WAL is check-pointed during puts here: >>> >>> final long modCount = modifications.getAndIncrement(); >>> if ( modCount > 0 && modCount % 100000 == 0 ) { >>> wali.checkpoint(); >>> } >>> >>> And during deletes here: >>> >>> final long modCount = modifications.getAndIncrement(); >>> if (modCount > 0 && modCount % 1000 == 0) { >>> wali.checkpoint(); >>> } >>> >>> Not sure if that was intentional that put operations check point every >>> 100k and and deletes check point every 1k. >>> >>> Maybe Mark or others could shed some light on why the snapshot is >>> reaching 3GB in size. >>> >>> -Bryan >>> >>> >>> On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]> wrote: >>>> Hi folks, >>>> >>>> Is there a technical description of how the DistributedMapCacheServer >>>> (DMCS) persistence works? I've noticed the following on our cluster: >>>> >>>> - I have the DMCS configured on port 4557 as FIFO with max 100,000 >>> entries, >>>> and have specified a persistence directory >>>> - I am using DetectDuplicate with the DMCS, and the individual key >>> length >>>> is 80 bytes, with a Description length of 1 byte. By my count, this >>> should >>>> result in a pure data size of 7.7MB. >>>> - I notice that the snapshot file in the persistence directory appears >>> to >>>> continue growing past the 100,000 limit, though this may be expected >>>> depending on the implementation. Since I know that the key will contain >>>> "json" in it, I can run the following command to count the number of >>>> possible keys in the snapshot file (though I'm not sure if this is a >>> good >>>> way of measuring how many keys are actually cached): grep -oa json >>> snapshot >>>> | wc -l >>>> - When the snapshot file reaches around 3GB, the DMCS has a hard time >>>> staying up, and frequently becomes unreachable (netstat -tulpn | grep >>> 4557 >>>> shows nothing). At this point, in order to restore functionality I >>> delete >>>> the persistence directory and let it start over. >>>> >>>> So my main questions are: >>>> - How are the snapshot and partition files structured, and how can I >>>> estimate how many keys are actually cached at a given time? >>>> - Is the described behavior indicative of the cache exceeding the >>>> configured max number of keys? >>>> >>>> Thanks, >>>> Joe >>>> >>>> -- >>>> I know what it is to be in need, and I know what it is to have plenty. >>> I >>>> have learned the secret of being content in any and every situation, >>>> whether well fed or hungry, whether living in plenty or in want. I can >>> do >>>> all this through him who gives me strength. *-Philippians 4:12-13* >>> >> >> >> >> -- >> I know what it is to be in need, and I know what it is to have plenty. I >> have learned the secret of being content in any and every situation, >> whether well fed or hungry, whether living in plenty or in want. I can >> do all this through him who gives me strength. *-Philippians 4:12-13* >> > > > > -- > I know what it is to be in need, and I know what it is to have plenty. I > have learned the secret of being content in any and every situation, > whether well fed or hungry, whether living in plenty or in want. I can do > all this through him who gives me strength. *-Philippians 4:12-13*
