Re: Use of posix_fadvise

Benedict Elliott Smith Tue, 18 Oct 2016 10:07:41 -0700

This is what JIRA is for.  It seems to date back to CASSANDRA-1470, where
the default became immediately evicting newly compacted files.


This results in cold reads for *hot* data after compaction, so
CASSANDRA-6916 permitted evicting the *old* data instead, while
guaranteeing >= the same amount of eviction.

Whether or not the original issue of cold compaction data was a pain point,
I cannot attest, but I was assured (by whom, I do not recall) that it was.
In its present form it is at least not harmful.  It was (and is) not a
no-op:

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.6916v3-preempive-open-compact.mixed.2.json&metric=op_rate&operation=mixed&smoothing=1&show_aggregates=true&xmin=0&xmax=545.6&ymin=0&ymax=114638.7




On 18 October 2016 at 17:42, Michael Kjellman <mkjell...@internalcircle.com>
wrote:

> Yeah, it has been there for years -- that being said most of the community
> is just catching up to 2.1 and 3.0 now where the usage did appear to change
> over 2.0-- and I'm more trying to figure out what the intent was in the
> various usages all over the codebase and make sure it's actually doing
> that. Maybe even add some comments about that intent. :)
>
> In 2.1 I saw that we were doing this to get the file descriptor in some
> cases (which obviously will return the wrong file descriptor so most likely
> would have made this even more of a potential no-op than it already was?):
>
> public static int getfd(String path)
> {
>     RandomAccessFile file = null;
>     try
>     {
>         file = new RandomAccessFile(path, "r");
>         return getfd(file.getFD());
>     }
>     catch (Throwable t)
>     {
>         JVMStabilityInspector.inspectThrowable(t);
>         // ignore
>         return -1;
>     }
>     finally
>     {
>         try
>         {
>             if (file != null)
>                 file.close();
>         }
>         catch (Throwable t)
>         {
>             // ignore
>         }
>     }
> }
>
>
> On Oct 18, 2016, at 9:34 AM, Jake Luciani <jak...@gmail.com<mailto:jaker
> s...@gmail.com>> wrote:
>
> Although given we have an in process page cache[1] now this may not be
> needed anymore?
> This is only for the data file though.  I think its been years? since we
> showed it helped so perhaps someone should show if this is still
> working/helping in the real world.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5863
>
>
> On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com>> wrote:
>
> Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."
>
> On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
> mkjell...@internalcircle.com<mailto:mkjell...@internalcircle.com><mailto:
> mkjell...@internalcircle.com>> wrote:
>
> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> On Oct 18, 2016, at 8:50 AM, Jake Luciani <jak...@gmail.com<mailto:jaker
> s...@gmail.com><mailto:jaker
> s...@gmail.com<mailto:s...@gmail.com>>> wrote:
>
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
>
> On Oct 18, 2016 11:29 AM, "Michael Kjellman" <mkjell...@internalcircle.com
> <mailto:mkjell...@internalcircle.com>
> <mailto:mkjell...@internalcircle.com>>
> wrote:
>
> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone
>
>
>
>
>
> --
> http://twitter.com/tjake
>
>

Re: Use of posix_fadvise

Reply via email to