Cassandra Read Path Code Navigation

2016-06-13 Thread Bhuvan Rawal
Hi All,

Im debugging a issue in Cassandra 3.5 which was reported in user mailing
list earlier, is pretty critical to solve at our end. ill give a brief
intro: On issuing this query:;

select id,filter_name from navigation_bucket_filter where id=2429 and
filter_name='*Size_s*';

 id   | filter_name
--+--
 2429 | AdditionalProperty_s
 2429 |Brand
more rows---
 2429 |   Size_s
more rows---
 2429 | sdFullfilled
 2429 |   sellerCode

(16 rows)

Whereas *only one result was expected* (Row bearing filter_name - Size_s),
we got that result but along with 15 other unexpected rows..

Total number of rows in the partition are 20 (Verified using select
id,filter_name from navigation_bucket_filter where id=2429;) as well as
json dump. We are wondering why Cassandra could not filter the results
completely. I have checked that the data is intact by taking json dump and
validating using sstabledump tool.

The issue was resolved on production by using nodetool compact, but
debugging it is critical as to what led to this and issuing manual
compaction may not be possible everytime.

I copied the sstables of the particular table onto my local machine and *have
been able to reproduce the same* issue, while trying to run Cassandra in
debug mode I have been able to connect my IDE with it but unfortunately I
have not been able to navigate really far in the Read Path. Will be glad to
get a some pointers on where in the code SSTables are read and partition is
filtered.

Secondly, I wanted to know if there is a possible way by which we can read
the other SSTable files (Partition Index) Filter.db, Statistics.db, et al
as well as Commitlog. If such a utility does not exist currently but can be
created from existing classes pls let me know as well would love to build
and share one.


Best Regards,
Bhuvan Rawal


Re: Jira down, again?

2016-06-13 Thread Mahdi Mohammadi
And when it is not down, it is very slow for me.

Do others have the same experience?

Best Regards

On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams  wrote:

> Everyone.
>
> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> > Seems like Apache Jira is 100% down, again, for like the 500th time in
> the
> > last 2 months. Just me or everyone?
>


Jira down, again?

2016-06-13 Thread Michael Kjellman
Seems like Apache Jira is 100% down, again, for like the 500th time in the last 
2 months. Just me or everyone?

Possible Bug: bucket_low has no effect in STCS

2016-06-13 Thread Anuj Wadehra
Hi,

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.


Details
--
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.

 for (Entry entry : buckets.entrySet())
{...}



Thanks
Anuj


 



[VOTE RESULT] Release Apache Cassandra 3.7

2016-06-13 Thread Jake Luciani
With 6 binding +1 and no -1 the vote passes

On Thu, Jun 9, 2016 at 4:17 AM, Gary Dusbabek  wrote:

> +1
>
> On Wed, Jun 8, 2016 at 9:21 PM, Jake Luciani  wrote:
>
> > I propose the following artifacts for release as 3.7.
> >
> > sha1: 6815dc970565e6cd1e0169b5379f37da7a5a8a32
> > Git:
> >
> >
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.7-tentative
> > Artifacts:
> >
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1116/org/apache/cassandra/apache-cassandra/3.7/
> > Staging repository:
> >
> https://repository.apache.org/content/repositories/orgapachecassandra-1116/
> >
> > The artifacts as well as the debian package are also available here:
> > http://people.apache.org/~jake
> >
> > The vote will be open for 72 hours (longer if needed).
> >
> > [1]: http://goo.gl/uA2hU1 (CHANGES.txt)
> > [2]: http://goo.gl/e79k5m (NEWS.txt)
> > [3]: https://goo.gl/iBt11P (Test Report)
> >
>


[VOTE RESULT] Release Apache Cassandra 3.0.7

2016-06-13 Thread Jake Luciani
With 6 binding +1, one non-binding +1 and no -1 the vote passes

-- Forwarded message --
From: Tommy Stendahl 
Date: Thu, Jun 9, 2016 at 8:40 AM
Subject: Re: [VOTE] Release Apache Cassandra 3.0.7
To: dev@cassandra.apache.org


+1 (non-binding)


On 2016-06-08 21:35, Jake Luciani wrote:

> I propose the following artifacts for release as 3.0.7.
>
> sha1: 040ac666ac5cdf9cd0a01a845f2ea0af3a81a08b
> Git:
>
> http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=shortlog;h=refs/tags/3.0.7-tentative
> Artifacts:
>
> https://repository.apache.org/content/repositories/orgapachecassandra-1115/org/apache/cassandra/apache-cassandra/3.0.7/
> Staging repository:
> https://repository.apache.org/content/repositories/orgapachecassandra-1115/
>
> The artifacts as well as the debian package are also available here:
> http://people.apache.org/~jake
>
> The vote will be open for 72 hours (longer if needed).
>
> [1]: http://goo.gl/GYLlrI (CHANGES.txt)
> [2]: http://goo.gl/gK48Xw (NEWS.txt)
> [3]: https://goo.gl/fCrUCh (Test Report)
>
>


Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Michael Kjellman
Bhuvan,

You didn't disrespect anyone, so please don't apologize! Appreciate your 
positive and helpful comment for the OP :) 

best,
kjellman

> On Jun 13, 2016, at 8:50 AM, Bhuvan Rawal  wrote:
> 
> Hi Matt,
> 
> I suggested the resources keeping in mind the ease with which one can
> learn. My idea was not to disrespect Apache or community in any form, it
> was just to facilitate learning of a Newbie.
> While having a good wiki would be amazing and I believe we all agree on
> this Thread that current Documentation has a lot of scope for improvement.
> And I'm completely willing to contribute in whatever way possible to the
> docs and getting it reviewed.
> 
> Best Regards,
> Bhuvan
> 
> On Mon, Jun 13, 2016 at 8:17 PM, Eric Evans 
> wrote:
> 
>> On Mon, Jun 13, 2016 at 8:05 AM, Mattmann, Chris A (3980)
>>  wrote:
>>> However also see that besides the current documentation, there needs to
>> be
>>> a roadmap for making Apache Cassandra and *its* documentation (not
>> *DataStax’s*)
>>> up to par for a basic user to build, deploy and run Cassandra. I don’t
>> think that’s
>>> the current case, is it?
>> 
>> There is CASSANDRA-8700
>> (https://issues.apache.org/jira/browse/CASSANDRA-8700), which is a
>> step in this direction I hope.
>> 
>> One concern I do have though is that changing the tech used to
>> author/publish documentation won't in itself be enough to get good
>> docs.  In fact, moving the docs in-tree raises the barrier to
>> contribution in the sense that instead of mashing 'Edit', you have to
>> put together a patch and have it reviewed.
>> 
>> That said, I also think that we've historically set the bar way too
>> high to committer/PMC, and that this may be an opportunity to change
>> that; There ought to be a path to the PMC for documentation authors
>> and translators (and this is typical in other projects).  So, I will
>> personally do my best to set aside some time each week to review and
>> merge documentation changes, and to champion regular doc contributors
>> for committership.  Hopefully there are others willing to do the same!
>> 
>> 
>> --
>> Eric Evans
>> john.eric.ev...@gmail.com
>> 



Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Bhuvan Rawal
Hi Matt,

I suggested the resources keeping in mind the ease with which one can
learn. My idea was not to disrespect Apache or community in any form, it
was just to facilitate learning of a Newbie.
While having a good wiki would be amazing and I believe we all agree on
this Thread that current Documentation has a lot of scope for improvement.
And I'm completely willing to contribute in whatever way possible to the
docs and getting it reviewed.

Best Regards,
Bhuvan

On Mon, Jun 13, 2016 at 8:17 PM, Eric Evans 
wrote:

> On Mon, Jun 13, 2016 at 8:05 AM, Mattmann, Chris A (3980)
>  wrote:
> > However also see that besides the current documentation, there needs to
> be
> > a roadmap for making Apache Cassandra and *its* documentation (not
> *DataStax’s*)
> > up to par for a basic user to build, deploy and run Cassandra. I don’t
> think that’s
> > the current case, is it?
>
> There is CASSANDRA-8700
> (https://issues.apache.org/jira/browse/CASSANDRA-8700), which is a
> step in this direction I hope.
>
> One concern I do have though is that changing the tech used to
> author/publish documentation won't in itself be enough to get good
> docs.  In fact, moving the docs in-tree raises the barrier to
> contribution in the sense that instead of mashing 'Edit', you have to
> put together a patch and have it reviewed.
>
> That said, I also think that we've historically set the bar way too
> high to committer/PMC, and that this may be an opportunity to change
> that; There ought to be a path to the PMC for documentation authors
> and translators (and this is typical in other projects).  So, I will
> personally do my best to set aside some time each week to review and
> merge documentation changes, and to champion regular doc contributors
> for committership.  Hopefully there are others willing to do the same!
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Mattmann, Chris A (3980)
Hi Benjamin,



On 6/13/16, 6:38 AM, "Benjamin Lerer"  wrote:

>Hi Chris,
>
>Disclaimer: I am a Datastax employee
>
>It is clear to me that the current official documentation is far from being
>enough. That's why I fully support the decision made by Jonathan to do our
>best to improve it.

Just as a small piece of advice - it seems like Jonathan is the “boss” of this
project. I’ve spoken with him here and there - he’s a great guy don’t get me
wrong - but Apache projects don’t have bosses. He is the chair of the project -
that earns him the great glory having to write a board report every month after
the project is created, and quarterly thereafter. The chair is expected to be
the eyes and ears of the project for the board. The project has a “Project 
Management Committee (PMC) or PMC” responsible jointly for stewarding the
project. There is also a “Committer” role at the ASF. Some communities define
PMC == C. The committer role does not have a binding VOTE on releases of the
software and/or on additions of new personnel to the project.

The reason I pointed this out and it may have just been me misreading but
it sounded like you suggested something like: Jonathan makes decision for
the project; you all jump. And I am just saying I hope that’s not the case.
You all should have equal decision making ability in the project especially
on the PMC.

>
>As an Apache Cassandra Committer mostly working on the CQL layer, I know
>that we have done our best to keep the CQL documentation up to date
>(https://cassandra.apache.org/doc/cql3/CQL-3.0.html). Now, English not
>being the native language of some of us, and as we are not technical
>writers, I would not really be surprised if some external persons have done
>a better job than us.
>
>I think our goal should be to provide an accurate and reliable
>documentation for the project.

I would amend the above to add “for the project[at the ASF]”. That’s 
the thing - as a *first* (and not *second*) though, the ASF project 
should be getting careful attention and that includes the documentation.


> Nevertheless, it seems legitimate to me to
>also provide links to external documentations, when people are asking for
>it, if others did a better job than us.

Sure, this happens in some projects from time to time. When there isn’t
a perception of control, it is possible to do this, especially if coinciding
with the external links there is some roadmap or some plan for actually keeping
the ASF documentation up to date. Real data point here - I wrote a book about
Apache Tika, Tika in Action. This was done, with frequent updates on what’s 
going on to d...@tika.apache.org. Over time, eventually we worked with Manning 
Publications to donate the code samples and examples from the book to the 
Apache 
Tika project. Much of the book inspiration and examples made it into Apache Tika
in parallel to the goings-on outside.

In a neutral playing ground it’s sometimes fine to point to external sources.
When those external sources usually boil down to a company’s web pages, and
there is strong perception that company is controlling the project, you can see
the dichotomy here.

>
>The conclusion that we can draw from Buhvan response is that the official
>documentation is probably currently not good enough as he is pointing to
>it. I believe that once we will have solve this problem, people will be
>more likely to make a reference to it. Until then, we should not be
>surprised if people are not pointing to it.

See above.

However also see that besides the current documentation, there needs to be
a roadmap for making Apache Cassandra and *its* documentation (not *DataStax’s*)
up to par for a basic user to build, deploy and run Cassandra. I don’t think 
that’s
the current case, is it?

Thanks for your email. I am hoping that we can work together to
get the project’s documentation (and also its governance) in a
better shape. 

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++




>
>On Sun, Jun 12, 2016 at 5:16 PM, Chris Mattmann  wrote:
>
>> Hi Harmeet,
>>
>> The dev list is the lifeblood of an Apache project, and
>> projects here at the ASF conduct 99% of their business in
>> public, not in private. The ASF is a non-profit for the
>> public good and we have a tradition of openness and
>> transparency.
>>
>> Even if the 

Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Benjamin Lerer
Hi Chris,

Disclaimer: I am a Datastax employee

It is clear to me that the current official documentation is far from being
enough. That's why I fully support the decision made by Jonathan to do our
best to improve it.

As an Apache Cassandra Committer mostly working on the CQL layer, I know
that we have done our best to keep the CQL documentation up to date
(https://cassandra.apache.org/doc/cql3/CQL-3.0.html). Now, English not
being the native language of some of us, and as we are not technical
writers, I would not really be surprised if some external persons have done
a better job than us.

I think our goal should be to provide an accurate and reliable
documentation for the project. Nevertheless, it seems legitimate to me to
also provide links to external documentations, when people are asking for
it, if others did a better job than us.

The conclusion that we can draw from Buhvan response is that the official
documentation is probably currently not good enough as he is pointing to
it. I believe that once we will have solve this problem, people will be
more likely to make a reference to it. Until then, we should not be
surprised if people are not pointing to it.

What do you think?

Best,

Benjamin


On Sun, Jun 12, 2016 at 5:16 PM, Chris Mattmann  wrote:

> Hi Harmeet,
>
> The dev list is the lifeblood of an Apache project, and
> projects here at the ASF conduct 99% of their business in
> public, not in private. The ASF is a non-profit for the
> public good and we have a tradition of openness and
> transparency.
>
> Even if the business isn’t pleasant some times, it must
> be discussed, in public. The committers and PMC members for
>
> the code base - the name of which is *Apache* Cassandra as
> the project is here at the *Apache Software Foundation* -
> are Apache Software Foundation committers first, when they
> deal or steward the Apache code-base. Even before their
> $dayjobs.
>
> Cheers,
> Chris
>
>
> On 6/11/16, 11:54 PM, "mylistt...@gmail.com"  wrote:
>
> >Dear All,
> >
> >I am user of Cassandra. I am grateful to each of you for providing your
> time as committers to the code base for a great product.
> >
> >This is what I wanted to suggest - could you gentlemen not create a group
> email   Id to discuss matters of such importance amongst yourselves. Using
> the dev list I am not sure is the best place. I have been reading emails
> where insinuations have being made - if a particular company may high jack
> the code base etc.
> >
> >We are all developers , we love our code. I don't think this is right
> forum to bring things out of this proportion , read wash dirty linen.
> >
> >Pardon me if you think my opinion or inputs are wrong.
> >
> >I am newbie on Cassandra. I use it as an application developer. I don't
> have any intention to judge your experiences or thoughts. Just saying this
> could be done in a finer way without most if us getting to know about it.
> >
> >Regards,
> >Harmeet
> >
> >
> >
> >On Jun 12, 2016, at 2:31, Tom Barber  wrote:
> >
> >> Looking at that thread, I'm surprised you didn't call Dave out as well,
> >> that attitude did no one any favours.
> >>
> >>> Because lets all face the
> >>> facts here, no one "likes" writing drivers and documentation, and I
> have
> >>> done both for this project.
> >>
> >> That's clearly incorrect, I (and I suspect other people) like writing
> docs
> >> because it means people can use your tools in a much easier manner than
> >> looking through the code or unit tests.
> >>
> >> Tooling can be a burden but it doesn't excuse not writing docs, even if
> it
> >> becomes a PMC type rule for committers to commit Docs for new features
> like
> >> they should be committing unit tests. At least it improves what is
> shipped
> >> with the Apache project in question.
> >>
> >> Tom
> >>
> >> On Sat, Jun 11, 2016 at 7:21 PM, Chris Mattmann 
> wrote:
> >>
> >>> Hi Russell,
> >>>
> >>> [CC/board@, board members may want to join the
> >>> Apache Cassandra lists for specifics and further
> >>> engagement]
> >>>
> >>> Multiple things that need to be addressed below, but TL;DR:
> >>>
> >>> 1. I have asked the Apache Cassandra PMC, and its chair, to provide
> >>> a detailed description on how the project *isn’t* controlled by an
> >>> external entity in its next monthly board report. The below further
> >>> re-enforces the control. Further, it re-enforces the vitriol and
> >>> name calling attitude when questioned and when someone suggests
> >>> pointing to the Apache documentation and making it better as a first
> >>> step. I plan on making it very loudly known at our next board meeting
> >>> that something is awry. CC/board@ ahead of time on that.
> >>>
> >>> 2. You don’t seem to understand Apache. This is unfortunate.  I
> >>> went to go look you up and see if you are a PMC member for Apache
> >>> Cassandra. Funny enough, the main page doesn’t even link to the PMC
> >>>