Re: DistributedMapCacheServer question

Joe Gresock Thu, 09 Mar 2017 07:24:42 -0800

Good instinct -- here's what I get:

nifi-app.log:2017-03-09 15:03:00,670 INFO [Distributed Cache Server
Communications Thread: ac907dec-49a4-439e-99f5-1558f2358d87]
org.wali.MinimalLockingWriteAheadLog
org.wali.MinimalLockingWriteAheadLog@40569408 checkpointed with *4262902*
Records and 0 Swap Files in 256302 milliseconds (Stop-the-world time = 1378
milliseconds, Clear Edit Logs time = 19 millis), max Transaction ID 4263237


Looks like it's over 4.2 million records now.

On Thu, Mar 9, 2017 at 3:13 PM, Mark Payne <[email protected]> wrote:

> Joe,
>
> That definitely sounds like a bug causing the eviction to not happen. Can
> you grep your logs for the phrase
> "checkpointed with"? You should have a line that tells you how many
> records were written to the Snapshot.
> You will certainly see a few of these types of messages, though, because
> you have 1 for the FlowFile Repository,
> one for Local State Management, and another one for the
> DistributedMapCacheServer. I am curious to see if
> you see the log message indicating 3 million+ records also.
>
> Thanks
> -Mark
>
>
> > On Mar 8, 2017, at 7:13 PM, Joe Gresock <[email protected]> wrote:
> >
> > Looking through the PersistenceMapCache and SimpleMapCache, it seems like
> > lots of these records should have been evicted by now.  We're up to 3.1
> > million records on disk in the snapshot file.  My understanding is that
> > when wali.checkpoint() is called, it collapses all the DELETE records in
> > the journaled log and removes them before writing the snapshot file.  Is
> > that accurate?
> >
> > I feel like something is not going quite right with the eviction process.
> > I am using 1.1.1, though, and I have noticed that the PersistentMapCache
> > has changed in [1], so I might apply that patch and try some more
> > experiments.
> >
> > Would anyone be willing to try to replicate this behavior in NiFi 1.1.1?
> > You should be able to do it as follows:
> > Services:
> > DistributedMapCacheServer, maximum cache entries = 100,000, FIFO
> eviction,
> > persistence directory specified
> > DistributedMapCacheClientService, point to the same host and port
> >
> > Flow:
> > GenerateFlowFile (randomize 1K binary files in batches of 10, schedule 10
> > threads) ->HashContent (md5) into hash.value -> DetectDuplicate with
> > identifier = ${hash.value}, description = ., no age off, select your
> cache
> > client, cache identifier = true
> >
> > This should cause the snapshot file to exceed 100,000 keys pretty
> quickly,
> > and as far as I can tell, it never goes back down.  This in itself is
> not a
> > problem, but when the cache gets really big, it tends to crash our
> cluster
> > when NiFi reloads it into memory.
> >
> > [1] https://issues.apache.org/jira/browse/NIFI-3214
> >
> >
> > On Wed, Mar 8, 2017 at 11:06 AM, Joe Gresock <[email protected]> wrote:
> >
> >> Thanks Bryan, I'll start looking through the PersistenceMapCache.  This
> >> morning I checked back and the snapshot file now has 2.9 million keys
> in it.
> >>
> >> On Tue, Mar 7, 2017 at 4:39 PM, Bryan Bende <[email protected]> wrote:
> >>
> >>> Joe,
> >>>
> >>> I'm not that familiar with the persistence part of the DMCS, although
> >>> I do know that it uses the write-ahead-log that is also used by the
> >>> flow file repo.
> >>>
> >>> The code for PersistenceMapCache is here:
> >>> https://github.com/apache/nifi/blob/master/nifi-nar-bundles/
> >>> nifi-standard-services/nifi-distributed-cache-services-
> >>> bundle/nifi-distributed-cache-server/src/main/java/org/
> >>> apache/nifi/distributed/cache/server/map/PersistentMapCache.java
> >>>
> >>> It looks like the WAL is check-pointed during puts here:
> >>>
> >>> final long modCount = modifications.getAndIncrement();
> >>> if ( modCount > 0 && modCount % 100000 == 0 ) {
> >>>    wali.checkpoint();
> >>> }
> >>>
> >>> And during deletes here:
> >>>
> >>> final long modCount = modifications.getAndIncrement();
> >>> if (modCount > 0 && modCount % 1000 == 0) {
> >>>    wali.checkpoint();
> >>> }
> >>>
> >>> Not sure if that was intentional that put operations check point every
> >>> 100k and and deletes check point every 1k.
> >>>
> >>> Maybe Mark or others could shed some light on why the snapshot is
> >>> reaching 3GB in size.
> >>>
> >>> -Bryan
> >>>
> >>>
> >>> On Tue, Mar 7, 2017 at 7:07 AM, Joe Gresock <[email protected]>
> wrote:
> >>>> Hi folks,
> >>>>
> >>>> Is there a technical description of how the DistributedMapCacheServer
> >>>> (DMCS) persistence works?  I've noticed the following on our cluster:
> >>>>
> >>>> - I have the DMCS configured on port 4557 as FIFO with max 100,000
> >>> entries,
> >>>> and have specified a persistence directory
> >>>> - I am using DetectDuplicate with the DMCS, and the individual key
> >>> length
> >>>> is 80 bytes, with a Description length of 1 byte.  By my count, this
> >>> should
> >>>> result in a pure data size of 7.7MB.
> >>>> - I notice that the snapshot file in the persistence directory appears
> >>> to
> >>>> continue growing past the 100,000 limit, though this may be expected
> >>>> depending on the implementation.  Since I know that the key will
> contain
> >>>> "json" in it, I can run the following command to count the number of
> >>>> possible keys in the snapshot file (though I'm not sure if this is a
> >>> good
> >>>> way of measuring how many keys are actually cached): grep -oa json
> >>> snapshot
> >>>> | wc -l
> >>>> - When the snapshot file reaches around 3GB, the DMCS has a hard time
> >>>> staying up, and frequently becomes unreachable (netstat -tulpn | grep
> >>> 4557
> >>>> shows nothing).  At this point, in order to restore functionality I
> >>> delete
> >>>> the persistence directory and let it start over.
> >>>>
> >>>> So my main questions are:
> >>>> - How are the snapshot and partition files structured, and how can I
> >>>> estimate how many keys are actually cached at a given time?
> >>>> - Is the described behavior indicative of the cache exceeding the
> >>>> configured max number of keys?
> >>>>
> >>>> Thanks,
> >>>> Joe
> >>>>
> >>>> --
> >>>> I know what it is to be in need, and I know what it is to have plenty.
> >>> I
> >>>> have learned the secret of being content in any and every situation,
> >>>> whether well fed or hungry, whether living in plenty or in want.  I
> can
> >>> do
> >>>> all this through him who gives me strength.    *-Philippians 4:12-13*
> >>>
> >>
> >>
> >>
> >> --
> >> I know what it is to be in need, and I know what it is to have plenty.
> I
> >> have learned the secret of being content in any and every situation,
> >> whether well fed or hungry, whether living in plenty or in want.  I can
> >> do all this through him who gives me strength.    *-Philippians 4:12-13*
> >>
> >
> >
> >
> > --
> > I know what it is to be in need, and I know what it is to have plenty.  I
> > have learned the secret of being content in any and every situation,
> > whether well fed or hungry, whether living in plenty or in want.  I can
> do
> > all this through him who gives me strength.    *-Philippians 4:12-13*
>
>


-- 
I know what it is to be in need, and I know what it is to have plenty.  I
have learned the secret of being content in any and every situation,
whether well fed or hungry, whether living in plenty or in want.  I can do
all this through him who gives me strength.    *-Philippians 4:12-13*

Re: DistributedMapCacheServer question

Reply via email to