Re: Low hanging fruit crew

2016-10-18 Thread Nate McCall
That's a good idea, Ed, thanks for bringing this up.

Kurt, if you are offering up resources for review and test coverage,
there is work out there. Most immediately in the department of reviews
for "Patch Available."

While not quite low-hanging fruit, it's helpful to have non-committers
look through the patch while following the review checklist [0] and
then running test targets to make sure there are no (covered at least)
regressions.

Non-committers can non-binding +1 a patch and that makes it easier to
just commit as is. Obviously there is some requirements for knowing
what you are looking at and being comfortable with reviewing the
complexity of the patch.

Also, we as committers could do a better job (myself at the top of the
list here) of quickly scanning said +1'ed patches where reviewer has
run the tests, and then committing them.

We'll probably have a bug introduced here and there, but the tradeoff
is bringing more people up to speed and broadening the active
day-to-day pool of resources and eventually adding committers.

Thoughts?

[0] http://cassandra.apache.org/doc/latest/development/how_to_review.html

On Wed, Oct 19, 2016 at 11:58 AM, kurt Greaves  wrote:
> So there are a bunch of us at Instaclustr looking into contributing to the
> project on a more frequent basis. We'd definitely be interested in some
> kind of LHF crew under whom our new "contributors" could make tracks on
> becoming main committers, while having some close by colleagues who can
> help them through. Having said that I'm happy to be part of the so called
> LHF crew as soon as I'm qualified enough to do so. I'm currently full time
> on contributing to the C* project, so hopefuly it shouldn't be too long
> before I'm able to take on some of these duties.
>
> Kurt Greaves
> k...@instaclustr.com
> www.instaclustr.com


Re: Batch read requests to one physical host?

2016-10-18 Thread Eric Stevens
We've had some luck with bulk known key reads with grouping by replica and
doing SELECT... WHERE key IN(...). Not compatible with all data models, but
it works well where we can get away with it.

As a more general purpose construct it makes sense to me. In our driver
layer we have abstracted batches to support read batches (under which the
above method is applied) even though Cassandra doesn't support it first
class.

On Tue, Oct 18, 2016, 5:00 PM Dikang Gu  wrote:

> Hi there,
>
> We have couple use cases that are doing fanout read for their data, means
> one single read request from client contains multiple keys which live on
> different physical hosts. (I know it's not recommended way to access C*).
>
> Right now, on the coordinator, it will issue separate read commands even
> though they will go to the same physical host, which I think is causing a
> lot of overheads.
>
> I'm wondering is it valuable to provide a new read command, that
> coordinator can batch the reads to one datanode, and send to it in one
> message, and datanode will return the results for all keys belong to it?
>
> Any similar ideas before?
>
>
> --
> Dikang
>


Batch read requests to one physical host?

2016-10-18 Thread Dikang Gu
Hi there,

We have couple use cases that are doing fanout read for their data, means
one single read request from client contains multiple keys which live on
different physical hosts. (I know it's not recommended way to access C*).

Right now, on the coordinator, it will issue separate read commands even
though they will go to the same physical host, which I think is causing a
lot of overheads.

I'm wondering is it valuable to provide a new read command, that
coordinator can batch the reads to one datanode, and send to it in one
message, and datanode will return the results for all keys belong to it?

Any similar ideas before?


-- 
Dikang


Re: Low hanging fruit crew

2016-10-18 Thread kurt Greaves
So there are a bunch of us at Instaclustr looking into contributing to the
project on a more frequent basis. We'd definitely be interested in some
kind of LHF crew under whom our new "contributors" could make tracks on
becoming main committers, while having some close by colleagues who can
help them through. Having said that I'm happy to be part of the so called
LHF crew as soon as I'm qualified enough to do so. I'm currently full time
on contributing to the C* project, so hopefuly it shouldn't be too long
before I'm able to take on some of these duties.

Kurt Greaves
k...@instaclustr.com
www.instaclustr.com


Re: Use of posix_fadvise

2016-10-18 Thread Nate McCall
Benedict mentioned it briefly above, but the earliest / most detailed
conversation on this can be found in CASSANDRA-1470.

You may also find some stuff from the dev list and other issues around
the same time from specifically from Chris G. and Peter S. (names on
that ticket) as, IIRC, they were both doing a lot of experimentation
at the time tuning Cassandra IO for their hardware platforms. Both
those guys were knowledgeable/approachable - pretty sure they wouldn't
mind getting pinged directly for something interesting like this via
the twitters or some such.

-N

On Wed, Oct 19, 2016 at 6:28 AM, Michael Kjellman
 wrote:
> Sorry, No. Always document your assumptions. I shouldn't need to git blame a 
> thousand commits and read thru a billion tickets to maybe understand why 
> something was done. Clearly thru the conversations on this topic I've had on 
> IRC and the responses so far on this email thread it's not/still not obvious.
>
> best,
> kjellman
>
> On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith 
> > wrote:
>
> This is what JIRA is for.
>


Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
I'm not certain this is the best way to go about encouraging people to help
you, or generally encourage participation in the project.  You have seemed
to lash out at the project (and in this case me specifically) in a fairly
antagonistic manner a multitude of times in just a couple of hours.

Your original question, on zero, predates anything I know about.  JIRA is
your best bet, and provides historical context that is never going to live
in comments.  I did not imply that the comments were adequate, only that *this
is where you should probably look to answer your question.  *Comment policy
and norms have changed a lot throughout Cassandra's history, and you're
asking about a time that predates the current level of maturity, but JIRA
has always been (AFAIK) the main source of historical context.  I attempted
to provide some links into this to save you from the "billion" (handful) of
tickets.

I don't have time for another flamewar, so I will leave out trying to
assist you in future.




On 18 October 2016 at 18:28, Michael Kjellman 
wrote:

> Sorry, No. Always document your assumptions. I shouldn't need to git blame
> a thousand commits and read thru a billion tickets to maybe understand why
> something was done. Clearly thru the conversations on this topic I've had
> on IRC and the responses so far on this email thread it's not/still not
> obvious.
>
> best,
> kjellman
>
> On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith  > wrote:
>
> This is what JIRA is for.
>
>


Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Sorry, No. Always document your assumptions. I shouldn't need to git blame a 
thousand commits and read thru a billion tickets to maybe understand why 
something was done. Clearly thru the conversations on this topic I've had on 
IRC and the responses so far on this email thread it's not/still not obvious.

best,
kjellman

On Oct 18, 2016, at 10:07 AM, Benedict Elliott Smith 
> wrote:

This is what JIRA is for.



Re: Cleanup after yourselves please

2016-10-18 Thread Josh McKenzie
>
>  tests hastily and messly commented out line by line (*whyy?*)

 Couldn't we use /* */ comments instead of every single line one by one?


When Jake and I were mass porting unit tests for 8099, I know I used idea's
shortcut (ctrl + /) to block comment out things that wouldn't compile while
porting over other tests; multi-line comments break from other multi-line
comments inside/between methods. Unfortunately attribution wasn't retained
on merges so we don't know whether to blame Sylvain, Jake, or myself on the
commented out tests that snuck through in the final patch. =/

Not necessarily a good reason, but at least it is *a* reason.

On Tue, Oct 18, 2016 at 12:04 PM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Gotcha, I didn't know we were actually bringing them back from the dead!
>
> That being said, won't the unit tests need to be re-writtten (or at least
> refactored) after your work? Couldn't we use /* */ comments instead of
> every single line one by one? Given we use source control couldn't we
> remove the dead code and get it from the revision history if we need it in
> the future?
>
> > On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
> >
> > I'm currently working on actually making Super Columns work in CQL
> context.
> > Currently they do not really work[1].
> >
> > It's not a very small piece of work. It was in the pipeline for some
> time,
> > although there most likely were more important things that had to be
> worked
> > on. I understand your disappointment and am sorry you stumbled upon this.
> > But for now you may just disregard the commented tests. My branch is
> going
> > to be ready for review soon.
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> >
> >
> > On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> > mkjell...@internalcircle.com> wrote:
> >
> >> There was a bunch of tests hastily and messly commented out line by line
> >> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
> >> SuperColumns support post 8099.
> >>
> >> Could those responsible please cleanup after themselves? It's been a
> while
> >> since 8099 was committed in the first place and I don't see us adding
> Super
> >> Column support at this point and the unit tests surly will need to be
> >> rewritten anyways.
> >>
> >> As my mother always said, pick your dirty wet towel in the hamper off
> the
> >> floor and put it in the hamper please
> >>
> >> best,
> >> kjellman
> >>
> >> Sent from my iPhone
> >
> > --
> > Alex Petrov
>
>


Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
This is what JIRA is for.  It seems to date back to CASSANDRA-1470, where
the default became immediately evicting newly compacted files.

This results in cold reads for *hot* data after compaction, so
CASSANDRA-6916 permitted evicting the *old* data instead, while
guaranteeing >= the same amount of eviction.

Whether or not the original issue of cold compaction data was a pain point,
I cannot attest, but I was assured (by whom, I do not recall) that it was.
In its present form it is at least not harmful.  It was (and is) not a
no-op:

http://riptano.github.io/cassandra_performance/graph_v3/graph.html?stats=stats.6916v3-preempive-open-compact.mixed.2.json=op_rate=mixed=1_aggregates=true=0=545.6=0=114638.7




On 18 October 2016 at 17:42, Michael Kjellman 
wrote:

> Yeah, it has been there for years -- that being said most of the community
> is just catching up to 2.1 and 3.0 now where the usage did appear to change
> over 2.0-- and I'm more trying to figure out what the intent was in the
> various usages all over the codebase and make sure it's actually doing
> that. Maybe even add some comments about that intent. :)
>
> In 2.1 I saw that we were doing this to get the file descriptor in some
> cases (which obviously will return the wrong file descriptor so most likely
> would have made this even more of a potential no-op than it already was?):
>
> public static int getfd(String path)
> {
> RandomAccessFile file = null;
> try
> {
> file = new RandomAccessFile(path, "r");
> return getfd(file.getFD());
> }
> catch (Throwable t)
> {
> JVMStabilityInspector.inspectThrowable(t);
> // ignore
> return -1;
> }
> finally
> {
> try
> {
> if (file != null)
> file.close();
> }
> catch (Throwable t)
> {
> // ignore
> }
> }
> }
>
>
> On Oct 18, 2016, at 9:34 AM, Jake Luciani  s...@gmail.com>> wrote:
>
> Although given we have an in process page cache[1] now this may not be
> needed anymore?
> This is only for the data file though.  I think its been years? since we
> showed it helped so perhaps someone should show if this is still
> working/helping in the real world.
>
> [1] https://issues.apache.org/jira/browse/CASSANDRA-5863
>
>
> On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."
>
> On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
> mkjell...@internalcircle.com mkjell...@internalcircle.com>> wrote:
>
> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> On Oct 18, 2016, at 8:50 AM, Jake Luciani  s...@gmail.com> s...@gmail.com>> wrote:
>
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
>
> On Oct 18, 2016 11:29 AM, "Michael Kjellman"  
> >
> wrote:
>
> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone
>
>
>
>
>

Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Yeah, it has been there for years -- that being said most of the community is 
just catching up to 2.1 and 3.0 now where the usage did appear to change over 
2.0-- and I'm more trying to figure out what the intent was in the various 
usages all over the codebase and make sure it's actually doing that. Maybe even 
add some comments about that intent. :)

In 2.1 I saw that we were doing this to get the file descriptor in some cases 
(which obviously will return the wrong file descriptor so most likely would 
have made this even more of a potential no-op than it already was?):

public static int getfd(String path)
{
RandomAccessFile file = null;
try
{
file = new RandomAccessFile(path, "r");
return getfd(file.getFD());
}
catch (Throwable t)
{
JVMStabilityInspector.inspectThrowable(t);
// ignore
return -1;
}
finally
{
try
{
if (file != null)
file.close();
}
catch (Throwable t)
{
// ignore
}
}
}


On Oct 18, 2016, at 9:34 AM, Jake Luciani 
> wrote:

Although given we have an in process page cache[1] now this may not be
needed anymore?
This is only for the data file though.  I think its been years? since we
showed it helped so perhaps someone should show if this is still
working/helping in the real world.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5863


On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

Specifically regarding the behavior in different kernels, from `man
posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
this was interpreted literally as "zero bytes", rather than as meaning "all
bytes through to the end of the file"."

On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
mkjell...@internalcircle.com>
 wrote:

Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances
have been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we
would immediately tidy() it and then call posix_fadvise with
POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
thinking is doing so in respect to the API behavior in modern Linux kernel
builds?). Am I reading things correctly here? Sorta hard as there are many
different code paths the reference could have tidy() called.

Why would we want to drop the segment we just write from the page cache --
wouldn't that most likely be the most hot data, and even if it turned out
not to be wouldn't it be better in this case to have kernel be smart at
what it's best at?

best,
kjellman

On Oct 18, 2016, at 8:50 AM, Jake Luciani 
>> wrote:

The main point is to avoid keeping things in the page cache that are no
longer needed like compacted data that has been early opened elsewhere.

On Oct 18, 2016 11:29 AM, "Michael Kjellman" 

>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone





--
http://twitter.com/tjake



Re: Use of posix_fadvise

2016-10-18 Thread Jake Luciani
Although given we have an in process page cache[1] now this may not be
needed anymore?
This is only for the data file though.  I think its been years? since we
showed it helped so perhaps someone should show if this is still
working/helping in the real world.

[1] https://issues.apache.org/jira/browse/CASSANDRA-5863


On Tue, Oct 18, 2016 at 11:59 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."
>
> On Oct 18, 2016, at 8:57 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> On Oct 18, 2016, at 8:50 AM, Jake Luciani  s...@gmail.com>> wrote:
>
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
>
> On Oct 18, 2016 11:29 AM, "Michael Kjellman"  >
> wrote:
>
> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone
>
>
>


-- 
http://twitter.com/tjake


Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
Cool, as I would have assumed they would need to be. Given they were initially 
commented out on 6/30/15 maybe cleanup and removal of that dead code is still 
at least warranted.

On Oct 18, 2016, at 9:15 AM, Oleksandr Petrov 
> wrote:

Unit tests will be completely rewritten I suspect.



8099 Storage Format Documentation as used with PRIMARY_INDEX

2016-10-18 Thread Michael Kjellman
I'm working on writing Birch for trunk and I noticed the following:

https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java#L503

Prior to 3.0 the offset was the literal offset into the data file, yet now we 
seem to be doing the position encoded with the key (for all rows regardless of 
if they're > 64kb and thus have an index component) plus the serialized offset. 
I also see there is now a a "header" offset.

In RowIndexEntry there is:


/**
 * @return the offset to the start of the header information for this row.
 * For some formats this may not be the start of the row.
 */
public long headerOffset()
{
return 0;
}

/**
 * The length of the row header (partition key, partition deletion and static 
row).
 * This value is only provided for indexed entries and this method will throw
 * {@code UnsupportedOperationException} if {@code !isIndexed()}.
 */
public long headerLength()
{
throw new UnsupportedOperationException();
}


In 2.1 we stored the partition key, deletion, but not static row -- but we 
didn't need or use this so I'm guessing this is actually just to support static 
rows? Is there any further documentation around the header in other classes 
that I just haven't come across yet? Any thoughts on position + offset and why 
this behavior changed? Thanks

best,
kjellman


Re: Cleanup after yourselves please

2016-10-18 Thread Oleksandr Petrov
It makes sense to make them work for backwards compatibility. There was no
announcement that if you migrated to 3.x they wouldn't work, so everyone
would most likely expect a clear upgrade path. it's not bringing them back
from the dead, but rather helping people who want to upgrade to have this
option.

Unit tests will be completely rewritten I suspect.
On Tue, 18 Oct 2016 at 18:04, Michael Kjellman 
wrote:

> Gotcha, I didn't know we were actually bringing them back from the dead!
>
> That being said, won't the unit tests need to be re-writtten (or at least
> refactored) after your work? Couldn't we use /* */ comments instead of
> every single line one by one? Given we use source control couldn't we
> remove the dead code and get it from the revision history if we need it in
> the future?
>
> > On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov <
> oleksandr.pet...@gmail.com> wrote:
> >
> > I'm currently working on actually making Super Columns work in CQL
> context.
> > Currently they do not really work[1].
> >
> > It's not a very small piece of work. It was in the pipeline for some
> time,
> > although there most likely were more important things that had to be
> worked
> > on. I understand your disappointment and am sorry you stumbled upon this.
> > But for now you may just disregard the commented tests. My branch is
> going
> > to be ready for review soon.
> >
> > [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> >
> >
> > On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> > mkjell...@internalcircle.com> wrote:
> >
> >> There was a bunch of tests hastily and messly commented out line by line
> >> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
> >> SuperColumns support post 8099.
> >>
> >> Could those responsible please cleanup after themselves? It's been a
> while
> >> since 8099 was committed in the first place and I don't see us adding
> Super
> >> Column support at this point and the unit tests surly will need to be
> >> rewritten anyways.
> >>
> >> As my mother always said, pick your dirty wet towel in the hamper off
> the
> >> floor and put it in the hamper please
> >>
> >> best,
> >> kjellman
> >>
> >> Sent from my iPhone
> >
> > --
> > Alex Petrov
>
> --
Alex Petrov


Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg
Hi,

Compaction can merge some very large files together with data that may
be completely cold. So yeah caching the whole file just creates pressure
to evict useful stuff. In some theories.

In other theories the page cache is flush and scan resistant and should
just eat this stuff up without intervention. Sure it might hurt a bit,
but it's a bounded amount before the cache stops discarding useful stuff
in favor of new stuff that is unproven.

If there is a benchmark with this enabled/disabled I haven't seen it.
Doesn't mean it doesn't exist though.

Ariel
On Tue, Oct 18, 2016, at 12:05 PM, Michael Kjellman wrote:
> Within a single SegmentedFile?
> 
> On Oct 18, 2016, at 9:02 AM, Ariel Weisberg
> > wrote:
> 
> With compaction there can be hot and cold data mixed together.
> 


Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Within a single SegmentedFile?

On Oct 18, 2016, at 9:02 AM, Ariel Weisberg 
> wrote:

With compaction there can be hot and cold data mixed together.



Re: Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
Gotcha, I didn't know we were actually bringing them back from the dead! 

That being said, won't the unit tests need to be re-writtten (or at least 
refactored) after your work? Couldn't we use /* */ comments instead of every 
single line one by one? Given we use source control couldn't we remove the dead 
code and get it from the revision history if we need it in the future?

> On Oct 18, 2016, at 8:18 AM, Oleksandr Petrov  
> wrote:
> 
> I'm currently working on actually making Super Columns work in CQL context.
> Currently they do not really work[1].
> 
> It's not a very small piece of work. It was in the pipeline for some time,
> although there most likely were more important things that had to be worked
> on. I understand your disappointment and am sorry you stumbled upon this.
> But for now you may just disregard the commented tests. My branch is going
> to be ready for review soon.
> 
> [1] https://issues.apache.org/jira/browse/CASSANDRA-12373
> 
> 
> On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> There was a bunch of tests hastily and messly commented out line by line
>> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
>> SuperColumns support post 8099.
>> 
>> Could those responsible please cleanup after themselves? It's been a while
>> since 8099 was committed in the first place and I don't see us adding Super
>> Column support at this point and the unit tests surly will need to be
>> rewritten anyways.
>> 
>> As my mother always said, pick your dirty wet towel in the hamper off the
>> floor and put it in the hamper please
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone
> 
> -- 
> Alex Petrov



Re: Use of posix_fadvise

2016-10-18 Thread Ariel Weisberg
Hi,

With compaction there can be hot and cold data mixed together. So we want
to drop the data and then warm it via early opening so only the hot data is
in the cache.

Some of those cases are for the old sstable that have been rewritten or
discarded so the data is entirely defunct. The files might not get deleted
though so they do add pressure to the cache until they are evicted.

In the instance you are looking at in a tidier won't there always be a
reference held in the current view for the column family? It don't think it
would constantly be evicting them nor closing/reopening and remapping the
file.


Specifically regarding the behavior in different kernels, from `man
> posix_fadvise`: "In kernels before 2.6.6, if len was specified as 0, then
> this was interpreted literally as "zero bytes", rather than as meaning "all
> bytes through to the end of the file"."

Not ideal, but at least not actively harmful right? The cache is supposed
to be scan/flush resistant.

Ariel

On Tue, Oct 18, 2016 at 11:57 AM, Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> Right, so in SSTableReader#GlobalTidy$tidy it does:
> // don't ideally want to dropPageCache for the file until all instances
> have been released
> CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
> CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);
>
> It seems to me every time the reference is released on a new sstable we
> would immediately tidy() it and then call posix_fadvise with
> POSIX_FADV_DONTNEED with an offset of 0 and a length of 0 (which I'm
> thinking is doing so in respect to the API behavior in modern Linux kernel
> builds?). Am I reading things correctly here? Sorta hard as there are many
> different code paths the reference could have tidy() called.
>
> Why would we want to drop the segment we just write from the page cache --
> wouldn't that most likely be the most hot data, and even if it turned out
> not to be wouldn't it be better in this case to have kernel be smart at
> what it's best at?
>
> best,
> kjellman
>
> > On Oct 18, 2016, at 8:50 AM, Jake Luciani  wrote:
> >
> > The main point is to avoid keeping things in the page cache that are no
> > longer needed like compacted data that has been early opened elsewhere.
> >
> > On Oct 18, 2016 11:29 AM, "Michael Kjellman" <
> mkjell...@internalcircle.com>
> > wrote:
> >
> >> We use posix_fadvise in a bunch of places, and in stereotypical
> Cassandra
> >> fashion no comments were provided.
> >>
> >> There is a check the OS is Linux (okay, a start) but it turns out the
> >> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> >> kernels. We don't check the kernel version -- or even note it.
> >>
> >> What is the *expected* outcome of our use of posix_fadvise -- not what
> >> does it do or not do today -- but what problem was it added to solve and
> >> what's the expected behavior regardless of kernel versions.
> >>
> >> best,
> >> kjellman
> >>
> >> Sent from my iPhone
>
>


Low hanging fruit crew

2016-10-18 Thread Edward Capriolo
I go through the Cassandra jira weekly and I notice a number of tickets
which appear to be clear issues or requests for simple metrics.

https://issues.apache.org/jira/browse/CASSANDRA-12626

https://issues.apache.org/jira/browse/CASSANDRA-12330

I also have a few jira issues (opinion) would be simple to triage and
merge. Getting things merged is the primary pathway to meritocracy in the
ASF.

Across some other ASF projects I have seen that when the number of small
patches becomes larger that bodies to review them. It can result and chick
and egg scenario where reviewers feel overburdened, but the pathway to
removing this burden is promoting contributors to committers.

My suggestion:
Assemble a low-hanging-fruit-crew. This crew would be armed general support
for small commits, logging, metrics, test coverage, things static analysis
reveals etc. They would have a reasonable goal like "get 2 lhf merged a
day/week/whatever". If the process is successful, in a few months there
would hopefully be 1-2 committers graduated who would naturally wish to
move into low hanging fruit duties.

Thoughts?


Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Right, so in SSTableReader#GlobalTidy$tidy it does:
// don't ideally want to dropPageCache for the file until all instances have 
been released
CLibrary.trySkipCache(desc.filenameFor(Component.DATA), 0, 0);
CLibrary.trySkipCache(desc.filenameFor(Component.PRIMARY_INDEX), 0, 0);

It seems to me every time the reference is released on a new sstable we would 
immediately tidy() it and then call posix_fadvise with POSIX_FADV_DONTNEED with 
an offset of 0 and a length of 0 (which I'm thinking is doing so in respect to 
the API behavior in modern Linux kernel builds?). Am I reading things correctly 
here? Sorta hard as there are many different code paths the reference could 
have tidy() called.

Why would we want to drop the segment we just write from the page cache -- 
wouldn't that most likely be the most hot data, and even if it turned out not 
to be wouldn't it be better in this case to have kernel be smart at what it's 
best at?

best,
kjellman

> On Oct 18, 2016, at 8:50 AM, Jake Luciani  wrote:
> 
> The main point is to avoid keeping things in the page cache that are no
> longer needed like compacted data that has been early opened elsewhere.
> 
> On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
> wrote:
> 
>> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
>> fashion no comments were provided.
>> 
>> There is a check the OS is Linux (okay, a start) but it turns out the
>> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
>> kernels. We don't check the kernel version -- or even note it.
>> 
>> What is the *expected* outcome of our use of posix_fadvise -- not what
>> does it do or not do today -- but what problem was it added to solve and
>> what's the expected behavior regardless of kernel versions.
>> 
>> best,
>> kjellman
>> 
>> Sent from my iPhone



Re: Use of posix_fadvise

2016-10-18 Thread Jake Luciani
The main point is to avoid keeping things in the page cache that are no
longer needed like compacted data that has been early opened elsewhere.

On Oct 18, 2016 11:29 AM, "Michael Kjellman" 
wrote:

> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone


Re: Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
Sure -- my bad, I aggregated them all of them up for you:
https://github.com/apache/cassandra/search?utf8=✓=CLibrary.trySkipCache=Code
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/test/unit/org/apache/cassandra/utils/CLibraryTest.java#L34
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java#L102
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java#L218
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/hints/HintsWriter.java#L292
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/util/FileHandle.java#L167
https://github.com/apache/cassandra/blob/81f6c784ce967fadb6ed7f58de1328e713eaf53c/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java#L174
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2281
https://github.com/apache/cassandra/blob/f2a354763877cfeaf1dd017b84a7c8ee9eafd885/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L2282

Or if you use IDEA this should work pretty well too:
[cid:543B66BF-5E99-4227-A24D-1AB8C0341D97@localhost]

best,
kjellman


On Oct 18, 2016, at 8:33 AM, Benedict Elliott Smith 
> wrote:

... and continuing in the fashion of behaviours one might like to disabuse
people of, no code link is provided.



On 18 October 2016 at 16:28, Michael Kjellman 
>
wrote:

We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the
behavior of providing a length of 0 to posix_fadvise changed in some 2.6
kernels. We don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what
does it do or not do today -- but what problem was it added to solve and
what's the expected behavior regardless of kernel versions.

best,
kjellman

Sent from my iPhone



Re: Use of posix_fadvise

2016-10-18 Thread Benedict Elliott Smith
... and continuing in the fashion of behaviours one might like to disabuse
people of, no code link is provided.



On 18 October 2016 at 16:28, Michael Kjellman 
wrote:

> We use posix_fadvise in a bunch of places, and in stereotypical Cassandra
> fashion no comments were provided.
>
> There is a check the OS is Linux (okay, a start) but it turns out the
> behavior of providing a length of 0 to posix_fadvise changed in some 2.6
> kernels. We don't check the kernel version -- or even note it.
>
> What is the *expected* outcome of our use of posix_fadvise -- not what
> does it do or not do today -- but what problem was it added to solve and
> what's the expected behavior regardless of kernel versions.
>
> best,
> kjellman
>
> Sent from my iPhone


Use of posix_fadvise

2016-10-18 Thread Michael Kjellman
We use posix_fadvise in a bunch of places, and in stereotypical Cassandra 
fashion no comments were provided.

There is a check the OS is Linux (okay, a start) but it turns out the behavior 
of providing a length of 0 to posix_fadvise changed in some 2.6 kernels. We 
don't check the kernel version -- or even note it.

What is the *expected* outcome of our use of posix_fadvise -- not what does it 
do or not do today -- but what problem was it added to solve and what's the 
expected behavior regardless of kernel versions. 

best,
kjellman

Sent from my iPhone

Re: Cleanup after yourselves please

2016-10-18 Thread Oleksandr Petrov
I'm currently working on actually making Super Columns work in CQL context.
Currently they do not really work[1].

It's not a very small piece of work. It was in the pipeline for some time,
although there most likely were more important things that had to be worked
on. I understand your disappointment and am sorry you stumbled upon this.
But for now you may just disregard the commented tests. My branch is going
to be ready for review soon.

[1] https://issues.apache.org/jira/browse/CASSANDRA-12373


On Tue, Oct 18, 2016 at 5:10 PM Michael Kjellman <
mkjell...@internalcircle.com> wrote:

> There was a bunch of tests hastily and messly commented out line by line
> (*whyy?*) ColumnFamilyStoreTest with comments that they are pending
> SuperColumns support post 8099.
>
> Could those responsible please cleanup after themselves? It's been a while
> since 8099 was committed in the first place and I don't see us adding Super
> Column support at this point and the unit tests surly will need to be
> rewritten anyways.
>
> As my mother always said, pick your dirty wet towel in the hamper off the
> floor and put it in the hamper please
>
> best,
> kjellman
>
> Sent from my iPhone

-- 
Alex Petrov


Cleanup after yourselves please

2016-10-18 Thread Michael Kjellman
There was a bunch of tests hastily and messly commented out line by line 
(*whyy?*) ColumnFamilyStoreTest with comments that they are pending 
SuperColumns support post 8099. 

Could those responsible please cleanup after themselves? It's been a while 
since 8099 was committed in the first place and I don't see us adding Super 
Column support at this point and the unit tests surly will need to be rewritten 
anyways. 

As my mother always said, pick your dirty wet towel in the hamper off the floor 
and put it in the hamper please

best,
kjellman

Sent from my iPhone