Re: Jira down, again?

2016-06-15 Thread Michael Kjellman
down. again.

> On Jun 14, 2016, at 11:14 AM, Alex Popescu  wrote:
> 
> I've been trying to get to a ticket for the last 2h and I only get service
> unavailable :-(
> 
> On Tue, Jun 14, 2016 at 10:26 AM, Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
> 
>> and, it's down again. :(
>> 
>>> On Jun 14, 2016, at 4:48 AM, Dave Brosius  wrote:
>>> 
>>> They are aware of these things
>>> 
>>> https://twitter.com/infrabot 
>>> 
>>> On 06/14/2016 05:28 AM, Giampaolo Trapasso wrote:
 Hi to all,
 at the moment is the same for me. Is there a way to notify to someone
>> this
 situation?
 
 Giampaolo
 
 2016-06-13 23:27 GMT+02:00 Mahdi Mohammadi :
 
> And when it is not down, it is very slow for me.
> 
> Do others have the same experience?
> 
> Best Regards
> 
> On Tue, Jun 14, 2016 at 4:19 AM, Brandon Williams 
> wrote:
> 
>> Everyone.
>> 
>> On Mon, Jun 13, 2016 at 3:18 PM, Michael Kjellman <
>> mkjell...@internalcircle.com> wrote:
>> 
>>> Seems like Apache Jira is 100% down, again, for like the 500th time
>> in
>> the
>>> last 2 months. Just me or everyone?
>>> 
>> 
>> 
> 
> 
> -- 
> Bests,
> 
> Alex Popescu | @al3xandru
> Sen. Product Manager @ DataStax
> 
> 
> 
> » DataStax Enterprise - the database for cloud applications. «



Re: NewBie Question

2016-06-15 Thread Benedict Elliott Smith
For newcomers that (
https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md) is
probably a bad document to point them to, as it will no doubt confuse them
- the naming, behaviour and format descriptions are all now partially
incorrect.

It was, by its own admission, intended only for those who already knew the
2.2 codebase intimately so they could understand the 8099 patch.  It should
really be edited heavily so that those who didn't live through 8099 might
now derive value from it.

It's a real shame that, despite this document living in-tree, even the
class names are out of date - and were before it was even committed.  So,
as much as CASSANDRA-8700 is a fantastic step forwards, it looks likely to
be insufficient by itself, and the project may need to come up with a
strategy to encourage maintenance of the docs.






On 15 June 2016 at 17:55, Michael Kjellman 
wrote:

> This was forwarded to me yesterday... a helpful first step
> https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md
>
> > On Jun 15, 2016, at 9:54 AM, Jonathan Haddad  wrote:
> >
> > Maybe some brave soul will document the 3.0 on disk format as part of
> > https://issues.apache.org/jira/browse/CASSANDRA-8700.
> >
> > On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford <
> bradfor...@gmail.com>
> > wrote:
> >
> >> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> >> engine.
> >>
> >>
> >>
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
> >>
> >> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey  >
> >> wrote:
> >>
>  http://wiki.apache.org/cassandra/ArchitectureSSTable
> >>>
> >>> Be aware that this page hasn't been updated since 2013, so it doesn't
> >>> reflect any changes to the SSTable format since then, including the
> >>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
> >>>
> >>> That said, I believe the linked Apache wiki page is the best
> >>> documentation for the format. Unfortunately, if you want a better or
> >>> more current understanding, you'll have to read the code and read some
> >>> SSTables.
> >>>
> >>
>
>


Re: SSTable index format

2016-06-15 Thread Kaide Mu
C* 2.2 SSTable format is "la", "ma" is introduced in 3.0 including big
changes on storage engige.

Assuming you are asking about 2.2 and you are aware of SSTable is compound
by different components. Index file which is Index.db just maps row keys to
the position in Data.db. Now about how is Index.db is structured you may
want to check source code of RowIndexEntry specially how it is serialized,
also you probably may want to check ColumnIndex.Builder.build,
IndexHelper.IndexInfo.Serializer.serialize.  For a complete flushing
process I recommend you check carefully the source code of BigTableWriter
as you already did, for Index.db you probably have to check IndexWriter
section.

> On the other hand it seems that the ColumnIndex does not contain all the
columns of the data row.

Maybe someone can confirm this, but I guess your assumption is correct, the
idea is that the core abstraction which are we working in 2.2 are cells
instead of rows which is introduced in 3.0.

On Wed, Jun 15, 2016 at 3:53 PM Antonis Papaioannou 
wrote:

> Hi,
>
> I'm interested in the SSTable index file format and particularly in
> Cassandra 2.2 which uses the SSTable version "ma".
> Apart from keys and their corresponding offsets in the data file what
> else is included in each index entry?
>
> I'm trying to trace code when an SSTable is flushed (especially in class
> BigTableWriter.java).
> I see that each RowIndexEntry may contain a ColumnIndex which in turn it
> has a list with IndexHelper.IndexInfo entries.
> So i would expect the index format to be something like this:
> 
>
> On the other hand it seems that the ColumnIndex does not contain all the
> columns of the data row.
>
> Let me give you an example.
> Assume the following schema of a column family
> mytable ( y_id varchar primary key, field0 varchar, field1 varchar,
> field2 varchar);
>
> In this case if i execute the queries below:
> INSERT INTO ycsb.usertable (y_id, field0, field1, field2) VALUES ('k1',
> 'f1a', 'f1b', 'f1c');
> INSERT INTO ycsb.usertable (y_id, field0) VALUES ('k2', 'f2a');
>
> and then flush the table, I would expect the index to have the following
> info:
> k1, [field0, field1, field2], 
> k2, [field0], 
>
> Is this correct?
> Is there a documentation page with the file format of the index file?
>
>


Re: NewBie Question

2016-06-15 Thread Jonathan Ellis
Exactly.

On Wed, Jun 15, 2016 at 7:26 PM, J. D. Jordan 
wrote:

> I think high level concepts of how data is stored should be in user facing
> documentation. Storage format affects schema design. But low level
> specifics should be kept to contributor documentation.
>
> > On Jun 15, 2016, at 12:20 PM, Jonathan Ellis  wrote:
> >
> > I agree that it should be documented but I don't think it should be in
> user
> > level docs. Let's keep it in the wiki for contributors.
> >> On Jun 15, 2016 7:04 PM, "Jonathan Haddad"  wrote:
> >>
> >> Definitely required reading for anyone getting into it, plus Aaron's
> post.
> >> I think ideally one day something like this should live in the docs:
> >>
> >> https://www.postgresql.org/docs/9.0/static/storage-page-layout.html
> >>
> >> Definitely not suggesting everyone drop what they're doing and document
> the
> >> 3.0 format but ultimately I think there should be specs for any wire
> >> protocol or on disk format.  For wire protocols network diagrams can
> easily
> >> be generated via the nwdiag plugin.
> >>
> >> On Wed, Jun 15, 2016 at 9:55 AM Michael Kjellman <
> >> mkjell...@internalcircle.com> wrote:
> >>
> >>> This was forwarded to me yesterday... a helpful first step
> >>> https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md
> >>>
>  On Jun 15, 2016, at 9:54 AM, Jonathan Haddad 
> >> wrote:
> 
>  Maybe some brave soul will document the 3.0 on disk format as part of
>  https://issues.apache.org/jira/browse/CASSANDRA-8700.
> 
>  On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford <
> >>> bradfor...@gmail.com>
>  wrote:
> 
> > Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> > engine.
> >>
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
> >
> > On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey <
> >> jim.witsc...@datastax.com
> 
> > wrote:
> >
> >>> http://wiki.apache.org/cassandra/ArchitectureSSTable
> >>
> >> Be aware that this page hasn't been updated since 2013, so it
> doesn't
> >> reflect any changes to the SSTable format since then, including the
> >> new storage engine introduced in 3.0 (see CASSANDRA-8099).
> >>
> >> That said, I believe the linked Apache wiki page is the best
> >> documentation for the format. Unfortunately, if you want a better or
> >> more current understanding, you'll have to read the code and read
> >> some
> >> SSTables.
> >>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: NewBie Question

2016-06-15 Thread Jonathan Haddad
Since the specs change with the code id argue everything belongs in the git
repo including deep technical specs.
On Wed, Jun 15, 2016 at 10:26 AM J. D. Jordan 
wrote:

> I think high level concepts of how data is stored should be in user facing
> documentation. Storage format affects schema design. But low level
> specifics should be kept to contributor documentation.
>
> > On Jun 15, 2016, at 12:20 PM, Jonathan Ellis  wrote:
> >
> > I agree that it should be documented but I don't think it should be in
> user
> > level docs. Let's keep it in the wiki for contributors.
> >> On Jun 15, 2016 7:04 PM, "Jonathan Haddad"  wrote:
> >>
> >> Definitely required reading for anyone getting into it, plus Aaron's
> post.
> >> I think ideally one day something like this should live in the docs:
> >>
> >> https://www.postgresql.org/docs/9.0/static/storage-page-layout.html
> >>
> >> Definitely not suggesting everyone drop what they're doing and document
> the
> >> 3.0 format but ultimately I think there should be specs for any wire
> >> protocol or on disk format.  For wire protocols network diagrams can
> easily
> >> be generated via the nwdiag plugin.
> >>
> >> On Wed, Jun 15, 2016 at 9:55 AM Michael Kjellman <
> >> mkjell...@internalcircle.com> wrote:
> >>
> >>> This was forwarded to me yesterday... a helpful first step
> >>> https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md
> >>>
>  On Jun 15, 2016, at 9:54 AM, Jonathan Haddad 
> >> wrote:
> 
>  Maybe some brave soul will document the 3.0 on disk format as part of
>  https://issues.apache.org/jira/browse/CASSANDRA-8700.
> 
>  On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford <
> >>> bradfor...@gmail.com>
>  wrote:
> 
> > Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> > engine.
> >>
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
> >
> > On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey <
> >> jim.witsc...@datastax.com
> 
> > wrote:
> >
> >>> http://wiki.apache.org/cassandra/ArchitectureSSTable
> >>
> >> Be aware that this page hasn't been updated since 2013, so it
> doesn't
> >> reflect any changes to the SSTable format since then, including the
> >> new storage engine introduced in 3.0 (see CASSANDRA-8099).
> >>
> >> That said, I believe the linked Apache wiki page is the best
> >> documentation for the format. Unfortunately, if you want a better or
> >> more current understanding, you'll have to read the code and read
> >> some
> >> SSTables.
> >>
>


Re: NewBie Question

2016-06-15 Thread Jonathan Ellis
I agree that it should be documented but I don't think it should be in user
level docs. Let's keep it in the wiki for contributors.
On Jun 15, 2016 7:04 PM, "Jonathan Haddad"  wrote:

> Definitely required reading for anyone getting into it, plus Aaron's post.
> I think ideally one day something like this should live in the docs:
>
> https://www.postgresql.org/docs/9.0/static/storage-page-layout.html
>
> Definitely not suggesting everyone drop what they're doing and document the
> 3.0 format but ultimately I think there should be specs for any wire
> protocol or on disk format.  For wire protocols network diagrams can easily
> be generated via the nwdiag plugin.
>
> On Wed, Jun 15, 2016 at 9:55 AM Michael Kjellman <
> mkjell...@internalcircle.com> wrote:
>
> > This was forwarded to me yesterday... a helpful first step
> > https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md
> >
> > > On Jun 15, 2016, at 9:54 AM, Jonathan Haddad 
> wrote:
> > >
> > > Maybe some brave soul will document the 3.0 on disk format as part of
> > > https://issues.apache.org/jira/browse/CASSANDRA-8700.
> > >
> > > On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford <
> > bradfor...@gmail.com>
> > > wrote:
> > >
> > >> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> > >> engine.
> > >>
> > >>
> > >>
> >
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
> > >>
> > >> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey <
> jim.witsc...@datastax.com
> > >
> > >> wrote:
> > >>
> >  http://wiki.apache.org/cassandra/ArchitectureSSTable
> > >>>
> > >>> Be aware that this page hasn't been updated since 2013, so it doesn't
> > >>> reflect any changes to the SSTable format since then, including the
> > >>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
> > >>>
> > >>> That said, I believe the linked Apache wiki page is the best
> > >>> documentation for the format. Unfortunately, if you want a better or
> > >>> more current understanding, you'll have to read the code and read
> some
> > >>> SSTables.
> > >>>
> > >>
> >
> >
>


Re: NewBie Question

2016-06-15 Thread Michael Kjellman
This was forwarded to me yesterday... a helpful first step 
https://github.com/apache/cassandra/blob/cassandra-3.0.0/guide_8099.md

> On Jun 15, 2016, at 9:54 AM, Jonathan Haddad  wrote:
> 
> Maybe some brave soul will document the 3.0 on disk format as part of
> https://issues.apache.org/jira/browse/CASSANDRA-8700.
> 
> On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford 
> wrote:
> 
>> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
>> engine.
>> 
>> 
>> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
>> 
>> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey 
>> wrote:
>> 
 http://wiki.apache.org/cassandra/ArchitectureSSTable
>>> 
>>> Be aware that this page hasn't been updated since 2013, so it doesn't
>>> reflect any changes to the SSTable format since then, including the
>>> new storage engine introduced in 3.0 (see CASSANDRA-8099).
>>> 
>>> That said, I believe the linked Apache wiki page is the best
>>> documentation for the format. Unfortunately, if you want a better or
>>> more current understanding, you'll have to read the code and read some
>>> SSTables.
>>> 
>> 



Re: NewBie Question

2016-06-15 Thread Jonathan Haddad
Maybe some brave soul will document the 3.0 on disk format as part of
https://issues.apache.org/jira/browse/CASSANDRA-8700.

On Wed, Jun 15, 2016 at 7:02 AM Christopher Bradford 
wrote:

> Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
> engine.
>
>
> http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html
>
> On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey 
> wrote:
>
> > > http://wiki.apache.org/cassandra/ArchitectureSSTable
> >
> > Be aware that this page hasn't been updated since 2013, so it doesn't
> > reflect any changes to the SSTable format since then, including the
> > new storage engine introduced in 3.0 (see CASSANDRA-8099).
> >
> > That said, I believe the linked Apache wiki page is the best
> > documentation for the format. Unfortunately, if you want a better or
> > more current understanding, you'll have to read the code and read some
> > SSTables.
> >
>


Re: Documentation on a new CQL feature of 3.6

2016-06-15 Thread Oleksandr Petrov
As far as I understand this wording, it's correct. Prior to 3.6 *filtering*
was not allowed on clustering columns. It was allowed to do non-filtering
queries involving clustering columns, although you could not specify any
clustering column (or combine multiple range queries). Now it is allowed.

So the wording "clustering columns can be defined in WHERE clauses* if
ALLOW FILTERING is also used even if a secondary index is not created*" is
correct (emphasis to indicate the context you specified). Might be that
secondary index part sounded like the behaviour was somewhat similar before.

For the background, you can check
https://issues.apache.org/jira/browse/CASSANDRA-11310

Having that said, I'll add a little guide for querying capabilities for the
current version in scope of 8700.

Thank you

On Wed, Jun 15, 2016 at 5:03 PM Giampaolo Trapasso <
giampaolo.trapa...@radicalbit.io> wrote:

> Hi to all,
>
> DS Documentation says that
> *In Cassandra 3.6 and later, clustering columns can be defined in WHERE
> clauses if ALLOW FILTERING is also used even if a secondary index is not
> created. The table definition is given and then the SELECT command. Note
> that race_start_date is a clustering column that has no secondary index.*
>
> This seemed strange to me, since since in past I did queries using
> clustering columns.
>
> I did this quick check:
>
> cqlsh:test> show version
> [cqlsh 5.0.1 | Cassandra 2.2.5-SNAPSHOT | CQL spec 3.3.1 | Native protocol
> v4]
> cqlsh:test> SELECT * FROM calendar WHERE race_start_date='2015-06-13' ALLOW
> FILTERING;
>
>  race_id | race_start_date | race_end_date | race_name
> -+-+---+---
>
> (0 rows)
> cqlsh:test> SELECT * FROM calendar WHERE race_end_date='2015-06-13' ALLOW
> FILTERING;
> InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
> "race_end_date" cannot be restricted as preceding column "race_start_date"
> is not restricted"
> cqlsh:test>
>
> As you can see, in < 3.6 you can put clustering columns in queries as long
> as you respect the "preceding column" constraint. IMHO, that line should be
> changed saying that an arbitrary clustering column can be used from 3.6,
> and the example should use the 'race_end_date".
>
> This is DS documentation. Did not find something similar in community
> documentation (anycase 8700 is a work in progress, I will check in future).
> Let me know if I'm missing some point.
>
> Giampaolo
>
-- 
Alex Petrov


Documentation on a new CQL feature of 3.6

2016-06-15 Thread Giampaolo Trapasso
Hi to all,

DS Documentation says that
*In Cassandra 3.6 and later, clustering columns can be defined in WHERE
clauses if ALLOW FILTERING is also used even if a secondary index is not
created. The table definition is given and then the SELECT command. Note
that race_start_date is a clustering column that has no secondary index.*

This seemed strange to me, since since in past I did queries using
clustering columns.

I did this quick check:

cqlsh:test> show version
[cqlsh 5.0.1 | Cassandra 2.2.5-SNAPSHOT | CQL spec 3.3.1 | Native protocol
v4]
cqlsh:test> SELECT * FROM calendar WHERE race_start_date='2015-06-13' ALLOW
FILTERING;

 race_id | race_start_date | race_end_date | race_name
-+-+---+---

(0 rows)
cqlsh:test> SELECT * FROM calendar WHERE race_end_date='2015-06-13' ALLOW
FILTERING;
InvalidRequest: code=2200 [Invalid query] message="PRIMARY KEY column
"race_end_date" cannot be restricted as preceding column "race_start_date"
is not restricted"
cqlsh:test>

As you can see, in < 3.6 you can put clustering columns in queries as long
as you respect the "preceding column" constraint. IMHO, that line should be
changed saying that an arbitrary clustering column can be used from 3.6,
and the example should use the 'race_end_date".

This is DS documentation. Did not find something similar in community
documentation (anycase 8700 is a work in progress, I will check in future).
Let me know if I'm missing some point.

Giampaolo


Re: NewBie Question

2016-06-15 Thread Christopher Bradford
Consider taking a look at Aaron Morton's dive into the C* 3.0 storage
engine.

http://thelastpickle.com/blog/2016/03/04/introductiont-to-the-apache-cassandra-3-storage-engine.html

On Wed, Jun 15, 2016 at 9:38 AM Jim Witschey 
wrote:

> > http://wiki.apache.org/cassandra/ArchitectureSSTable
>
> Be aware that this page hasn't been updated since 2013, so it doesn't
> reflect any changes to the SSTable format since then, including the
> new storage engine introduced in 3.0 (see CASSANDRA-8099).
>
> That said, I believe the linked Apache wiki page is the best
> documentation for the format. Unfortunately, if you want a better or
> more current understanding, you'll have to read the code and read some
> SSTables.
>


SSTable index format

2016-06-15 Thread Antonis Papaioannou

Hi,

I'm interested in the SSTable index file format and particularly in 
Cassandra 2.2 which uses the SSTable version "ma".
Apart from keys and their corresponding offsets in the data file what 
else is included in each index entry?


I'm trying to trace code when an SSTable is flushed (especially in class 
BigTableWriter.java).
I see that each RowIndexEntry may contain a ColumnIndex which in turn it 
has a list with IndexHelper.IndexInfo entries.

So i would expect the index format to be something like this:


On the other hand it seems that the ColumnIndex does not contain all the 
columns of the data row.


Let me give you an example.
Assume the following schema of a column family
mytable ( y_id varchar primary key, field0 varchar, field1 varchar, 
field2 varchar);


In this case if i execute the queries below:
INSERT INTO ycsb.usertable (y_id, field0, field1, field2) VALUES ('k1', 
'f1a', 'f1b', 'f1c');

INSERT INTO ycsb.usertable (y_id, field0) VALUES ('k2', 'f2a');

and then flush the table, I would expect the index to have the following 
info:

k1, [field0, field1, field2], 
k2, [field0], 

Is this correct?
Is there a documentation page with the file format of the index file?



Re: Better code review

2016-06-15 Thread Josh McKenzie
We had a pretty long conversation about this very topic on the dev list
awhile ago (search for "Discussion: reviewing larger tickets" on the
mailing list). I think the final conclusion was that having the
back-and-forth via JIRA helped codify some of the design decisions that
took place during implementation and review that could be lost using an
external tool.

So while it's extra overhead and very raw from a tooling perspective, the
pros outweighed the cons.

On Wed, Jun 15, 2016 at 4:47 AM, Mahdi Mohammadi  wrote:

> Hi,
>
> Today I noticed there is a https://reviews.apache.org/r/# website which
> can
> be used for code review.
>
> Why not use it or even better use GitHub PR code review facilities?
>
>
> Best Regards
>


Re: NewBie Question

2016-06-15 Thread Jim Witschey
> http://wiki.apache.org/cassandra/ArchitectureSSTable

Be aware that this page hasn't been updated since 2013, so it doesn't
reflect any changes to the SSTable format since then, including the
new storage engine introduced in 3.0 (see CASSANDRA-8099).

That said, I believe the linked Apache wiki page is the best
documentation for the format. Unfortunately, if you want a better or
more current understanding, you'll have to read the code and read some
SSTables.


Re: Possible Bug: bucket_low has no effect in STCS

2016-06-15 Thread Aleksey Yeschenko
When in doubt, just open a JIRA. Thanks.

-- 
AY

On 15 June 2016 at 13:56:24, Anuj Wadehra (anujw_2...@yahoo.co.in.invalid) 
wrote:

Should I raise JIRA ?? Or some develiper with knowledge of STCS could confirm 
the bug ??  

Anuj  



Sent from Yahoo Mail on Android  

On Tue, 14 Jun, 2016 at 12:52 PM, Anuj Wadehra wrote: 
Can any developer confirm the issue?  

ThanksAnuj  


Sent from Yahoo Mail on Android  

On Mon, 13 Jun, 2016 at 11:15 PM, Anuj Wadehra wrote: 
Hi,  

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.  


Details  
--  
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.  

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.  


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.  

for (Entry entry : buckets.entrySet())  
            {...}  



Thanks  
Anuj  








Re: Possible Bug: bucket_low has no effect in STCS

2016-06-15 Thread Anuj Wadehra
Should I raise JIRA ?? Or some develiper with knowledge of STCS could confirm 
the bug ??

Anuj



Sent from Yahoo Mail on Android 
 
  On Tue, 14 Jun, 2016 at 12:52 PM, Anuj Wadehra wrote: 
  Can any developer confirm the issue?

ThanksAnuj


Sent from Yahoo Mail on Android 
 
  On Mon, 13 Jun, 2016 at 11:15 PM, Anuj Wadehra wrote: 
  Hi,

I am trying to understand the algorithm of STCS. As per my current 
understanding of the code, there seems to be no impact of setting bucket_low in 
the STCS compaction algorithm. Moreover, I see some optimization. I would 
appreciate if some designer can correct me or confirm that it's a bug sonthat I 
can raise a JIRA.


Details
--
getBuckets() method of SizeTieredCompactionStrategy sorts sstables by size in 
ascending order and then iterates over them one by one to associate them to an 
existing/new bucket. When, iterating sstables in ascending order of size, I 
can't find ANY single scenario where the current sstable in the outer loop 
iteration is below the oldAverageSize of any existing bucket. Current sstable 
being iterated will ALWAYS be greater than/equal to the oldAverageSize of ALL 
existing buckets as ALL previous sstables in existing buckets were 
smaller/equal in size to the sstable being iterated.

So, there is NO scenario when size > (oldAverageSize * bucketLow) and size < 
oldAverageSize, so bucket_low property never comes into play no matter what 
value you set for it.


Also, while iteraitng over sstables (sortedfiles) by size in ascending order, 
there is no point iterating over all existing buckets. We could just start from 
the LAST bucket where previous sstable was associated.  oldAverageSize of ALL 
other buckets will NEVER allow the sstable being iterated.

 for (Entry entry : buckets.entrySet())
            {...}



Thanks
Anuj


 

  
  


Re: NewBie Question

2016-06-15 Thread Eric Stevens
The file format is SSTable:
http://wiki.apache.org/cassandra/ArchitectureSSTable

If you're getting into byte-level detail, I highly recommend you
familiarize yourself with the read and/or write path first, because that
deep in the bowels there are some non-obvious things going on where
Cassandra differs from other database storage formats you might already be
familiar with:
https://wiki.apache.org/cassandra/ReadPathForUsers
https://wiki.apache.org/cassandra/WritePathForUsers

On Wed, Jun 15, 2016 at 4:30 AM Jonathan Ellis  wrote:

> It's a little more involved than that.  I suggest inserting a single row in
> a test table, then looking at the sstabledump output as a first step, then
> compare with two rows in a single partition.  Then you can code dive to see
> what sstabledump is actually doing if you really need the byte-level
> detail.
>
> On Wed, Jun 15, 2016 at 9:30 AM, Deepak Goel  wrote:
>
> > Hey
> >
> > Namaskara~Nalama~Guten Tag~Bonjour
> >
> > I tried searching for the fileformat of how cassandra stores its data,
> but
> > I couldn't find any...
> >
> > Suppose I have a database structure of the following format:
> >
> > RowID: Name:Age
> > 1: Deepak : 33
> > 2: Deepak1:34
> > 3: Deepak2:35
> >
> > How would this data actually stored in the data file of Cassandra?
> >
> > Would it be something like this:
> > Deepak:1:33
> > Deepak1:2:34
> > Deepak2:3:35
> >
> > Or, would it be:
> > 1:Deepak :33
> > 2:Deepak1:34
> > 3:Deepak2:35
> >
> > Thanks
> > Deepak
> >
> >
> >
> >--
> > Keigu
> >
> > Deepak
> > 73500 12833
> > www.simtree.net, dee...@simtree.net
> > deic...@gmail.com
> >
> > LinkedIn: www.linkedin.com/in/deicool
> > Skype: thumsupdeicool
> > Google talk: deicool
> > Blog: http://loveandfearless.wordpress.com
> > Facebook: http://www.facebook.com/deicool
> >
> > "Contribute to the world, environment and more :
> > http://www.gridrepublic.org
> > "
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>


Re: NewBie Question

2016-06-15 Thread Jonathan Ellis
It's a little more involved than that.  I suggest inserting a single row in
a test table, then looking at the sstabledump output as a first step, then
compare with two rows in a single partition.  Then you can code dive to see
what sstabledump is actually doing if you really need the byte-level detail.

On Wed, Jun 15, 2016 at 9:30 AM, Deepak Goel  wrote:

> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
> I tried searching for the fileformat of how cassandra stores its data, but
> I couldn't find any...
>
> Suppose I have a database structure of the following format:
>
> RowID: Name:Age
> 1: Deepak : 33
> 2: Deepak1:34
> 3: Deepak2:35
>
> How would this data actually stored in the data file of Cassandra?
>
> Would it be something like this:
> Deepak:1:33
> Deepak1:2:34
> Deepak2:3:35
>
> Or, would it be:
> 1:Deepak :33
> 2:Deepak1:34
> 3:Deepak2:35
>
> Thanks
> Deepak
>
>
>
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Reason for Trace Message Drop

2016-06-15 Thread Varun Barala
Hi all,

Can anyone tell me that what are all possible reasons for below log:-


*"INFO  [ScheduledTasks:1] 2016-06-14 06:27:39,498
MessagingService.java:929 - _TRACE messages were dropped in last 5000 ms:
928 for internal timeout and 0 for cross node timeout".*
I searched online for the same and found some reasons like:-

* Disk is not able to keep up with your ingest
* Resources are not able to support all parallel running tasks
* If other nodes are down then due to large hint replay
* Heavy workload

But in this case other kind of messages (mutation, read, write etc)  should
be dropped by *C** but It doesn't happen.

-
Cluster Specifications
--
number of nodes = 1
total number of CF = 2000

-
Machine Specifications
--
RAM 30 GB
hard disk SSD
ubuntu 14.04


Thanks in advance!!

Regards,
Varun Barala


Better code review

2016-06-15 Thread Mahdi Mohammadi
Hi,

Today I noticed there is a https://reviews.apache.org/r/# website which can
be used for code review.

Why not use it or even better use GitHub PR code review facilities?


Best Regards


NewBie Question

2016-06-15 Thread Deepak Goel
Hey

Namaskara~Nalama~Guten Tag~Bonjour

I tried searching for the fileformat of how cassandra stores its data, but
I couldn't find any...

Suppose I have a database structure of the following format:

RowID: Name:Age
1: Deepak : 33
2: Deepak1:34
3: Deepak2:35

How would this data actually stored in the data file of Cassandra?

Would it be something like this:
Deepak:1:33
Deepak1:2:34
Deepak2:3:35

Or, would it be:
1:Deepak :33
2:Deepak1:34
3:Deepak2:35

Thanks
Deepak



   --
Keigu

Deepak
73500 12833
www.simtree.net, dee...@simtree.net
deic...@gmail.com

LinkedIn: www.linkedin.com/in/deicool
Skype: thumsupdeicool
Google talk: deicool
Blog: http://loveandfearless.wordpress.com
Facebook: http://www.facebook.com/deicool

"Contribute to the world, environment and more : http://www.gridrepublic.org
"