Proposal - Feature to have versioning in Cassandra

2016-12-10 Thread Bhuvan Rawal
Hi Devs!

As this is related to a future improvement in Cassandra I thought this is
the appropriate mailing list.

I was reading a bit on HBase, it provides a facility to version rows with
number of version, etc can be specified in schema.

Although this can be easily achieved in cassandra as well but requires a
bit more involvement of client. Here is how it can be done currently -
having the last clustering column in schema as a timestamp. Each time a row
is being written write with a new timestamp, during read make sure latest
Timestamp column row is read. If number of rows returned is more than
versions, issue a delete call (Could be done while reading / a read before
write during write).

I believe this feature can be natively brought to Cassandra, this is what I
propose:
1. While creating schema it can be specified that versioning is supposed to
be on. If thats the case it should be validated that last clustering column
is a timestamp.
2. Whenever a write is performed we read existing partition and merge the
previous row into current row and insert with current timestamp in
clustering column.
3. If the partition count exceeds version count specified in schema then
issue delete for old version.

All of this can be done locally.

There is another possibility which looks promising, we can have a
materialized view which maintains versioning, this gives benefit to have
versioning possible for existing rows and not having to play around with
base table and possibly corrupt it. Also this will be a local MV (partition
resides locally) so performance implication should be less.

Steps for this could be - from users perspective:
1. Create a MV with versioning on. (That will internally mean that
timestamp clustering column is created in schema).
2. While writing into MV along with insert, read can be performed and if
version count is higher than max, delete can be issued.
3. While Read user can specify number of versions to be returned, cassandra
on reading complete partition can filter out the older versions.

Would like to have this validated before creating a Jira.

Regards,
Bhuvan


Cassandra Read Path Code Navigation

2016-06-13 Thread Bhuvan Rawal
Hi All,

Im debugging a issue in Cassandra 3.5 which was reported in user mailing
list earlier, is pretty critical to solve at our end. ill give a brief
intro: On issuing this query:;

select id,filter_name from navigation_bucket_filter where id=2429 and
filter_name='*Size_s*';

 id   | filter_name
--+--
 2429 | AdditionalProperty_s
 2429 |Brand
more rows---
 2429 |   Size_s
more rows---
 2429 | sdFullfilled
 2429 |   sellerCode

(16 rows)

Whereas *only one result was expected* (Row bearing filter_name - Size_s),
we got that result but along with 15 other unexpected rows..

Total number of rows in the partition are 20 (Verified using select
id,filter_name from navigation_bucket_filter where id=2429;) as well as
json dump. We are wondering why Cassandra could not filter the results
completely. I have checked that the data is intact by taking json dump and
validating using sstabledump tool.

The issue was resolved on production by using nodetool compact, but
debugging it is critical as to what led to this and issuing manual
compaction may not be possible everytime.

I copied the sstables of the particular table onto my local machine and *have
been able to reproduce the same* issue, while trying to run Cassandra in
debug mode I have been able to connect my IDE with it but unfortunately I
have not been able to navigate really far in the Read Path. Will be glad to
get a some pointers on where in the code SSTables are read and partition is
filtered.

Secondly, I wanted to know if there is a possible way by which we can read
the other SSTable files (Partition Index) Filter.db, Statistics.db, et al
as well as Commitlog. If such a utility does not exist currently but can be
created from existing classes pls let me know as well would love to build
and share one.


Best Regards,
Bhuvan Rawal


Re: NewBie Question ~ Book for Cassandra

2016-06-13 Thread Bhuvan Rawal
Hi Matt,

I suggested the resources keeping in mind the ease with which one can
learn. My idea was not to disrespect Apache or community in any form, it
was just to facilitate learning of a Newbie.
While having a good wiki would be amazing and I believe we all agree on
this Thread that current Documentation has a lot of scope for improvement.
And I'm completely willing to contribute in whatever way possible to the
docs and getting it reviewed.

Best Regards,
Bhuvan

On Mon, Jun 13, 2016 at 8:17 PM, Eric Evans 
wrote:

> On Mon, Jun 13, 2016 at 8:05 AM, Mattmann, Chris A (3980)
>  wrote:
> > However also see that besides the current documentation, there needs to
> be
> > a roadmap for making Apache Cassandra and *its* documentation (not
> *DataStax’s*)
> > up to par for a basic user to build, deploy and run Cassandra. I don’t
> think that’s
> > the current case, is it?
>
> There is CASSANDRA-8700
> (https://issues.apache.org/jira/browse/CASSANDRA-8700), which is a
> step in this direction I hope.
>
> One concern I do have though is that changing the tech used to
> author/publish documentation won't in itself be enough to get good
> docs.  In fact, moving the docs in-tree raises the barrier to
> contribution in the sense that instead of mashing 'Edit', you have to
> put together a patch and have it reviewed.
>
> That said, I also think that we've historically set the bar way too
> high to committer/PMC, and that this may be an opportunity to change
> that; There ought to be a path to the PMC for documentation authors
> and translators (and this is typical in other projects).  So, I will
> personally do my best to set aside some time each week to review and
> merge documentation changes, and to champion regular doc contributors
> for committership.  Hopefully there are others willing to do the same!
>
>
> --
> Eric Evans
> john.eric.ev...@gmail.com
>


Re: NewBie Question ~ Book for Cassandra

2016-06-11 Thread Bhuvan Rawal
Hi Deepak,

You can try Datastax Docs, they are most extensive and updated
documentation available.
As Cassandra is a fast developing technology I wonder if there is a Book in
the market which covers latest features like Materialized Views/ SASI Index
or new SSTable Format. I believe the best starting point would be the
Academy Tutorials and further Planet Cassandra - A week in Cassandra series
provides good overview of blogs and developments by Cassandra Evangelists.
It also provides link of top blogs which help understand internal working
of the Database.

However if you still feel the need, you may refer to books, here are some
that I know of -
Beginning Apache Cassandra Development - Vivek Mishra - 2014 - Link

Cassandra Data Modeling and Analysis - 2014 C.Y. Kan - Link

Mastering Apache Cassandra - Second Edition - March 26 2015 - Link

Cassandra Design Patterns - 2015 - Link

Cassandra High Availability - 2014 - Link

Learning Apache Cassandra - Manage Fault Tolerant and Scalable Real-Time
Data - 2015 - Link


Best Regards,
Bhuvan
Datastax Certified Architect

On Sat, Jun 11, 2016 at 8:28 PM, Deepak Goel  wrote:

> Hey
>
> Namaskara~Nalama~Guten Tag~Bonjour
>
> I am a newbie.
>
> Which would be the best book for a newbie to learn Cassandra?
>
> Thank You
> Deepak
>--
> Keigu
>
> Deepak
> 73500 12833
> www.simtree.net, dee...@simtree.net
> deic...@gmail.com
>
> LinkedIn: www.linkedin.com/in/deicool
> Skype: thumsupdeicool
> Google talk: deicool
> Blog: http://loveandfearless.wordpress.com
> Facebook: http://www.facebook.com/deicool
>
> "Contribute to the world, environment and more :
> http://www.gridrepublic.org
> "
>


Re: Using writetime in CAS Lightweight transactions

2016-05-11 Thread Bhuvan Rawal
Hi Tyler/Doan,

If we have already reached till the cell level and put the LWT condition on
it im sure from a design standpoint it shouldnt be really difficult to get
the cell write time and verify the same, maybe to be considered in a future
release. In this case it looks cleaner and if performance doesnt degrade
then im sure it can be used in multiple use cases. (fetch a row, do some
processing save it if its not been updated, if it has been updated then
repeat). In other words:  If updated time has not changed then do a write.

Best Regards,
Bhuvan

On Wed, May 11, 2016 at 9:32 PM, Tyler Hobbs  wrote:

> On Wed, May 11, 2016 at 10:22 AM, DuyHai Doan 
> wrote:
>
> > It is not (yet) possible to use functions in LWT predicates. LWT only
> > supports = and != plus IF (NOT) EXISTS right now
> >
>
> You're correct about functions not being supported, but we do actually
> support >, >=, <, <=, and IN operators (see
> https://issues.apache.org/jira/browse/CASSANDRA-6839).
>
> --
> Tyler Hobbs
> DataStax 
>


Using writetime in CAS Lightweight transactions

2016-05-11 Thread Bhuvan Rawal
Hi,

I was working on maintaining counters in cassandra, I read that it is not
100% accurate, I have tested them to be extremely accurate with 3.0.5 but
going by docs there may be slight in accuracy.

Thinking about this I thought there may be 2 approaches to resolve this:
1. Read the existing count and write the new count if table count is the
count earlier fetched using LWT transaction.
2. Read the existing writetime(count) and write the new count if
writetime(count) is the writetime count earlier fetched using lightweight
transaction.-

For me approach1 is working, but approach 2 doesnt. Please Find below the
same:

superuser@cqlsh:test_keyspace> CREATE TABLE test_keyspace.test (
... part_k int,
... clust_k text,
... count int,
... PRIMARY KEY (part_k, clust_k)
... );
superuser@cqlsh:test_keyspace> insert into test(part_k , clust_k , count )
values (2390, 'Test Ck Value', 007);
superuser@cqlsh:test_keyspace> select * from test;

 part_k | clust_k   | count
+---+---
   2390 | Test Ck Value | 7

superuser@cqlsh:test_keyspace> update test set count=8 where part_k=2390
and clust_k='Test Ck Value' if count=7;
# Works perfectly

superuser@cqlsh:test_keyspace> select writetime(count),part_k,clust_k,count
from test;

 writetime(count) | part_k | clust_k   | count
--++---+---
 1462974475292000 |   2390 | Test Ck Value | 8

# Now try to use the write time in LWT
superuser@cqlsh:test_keyspace> update test set count=8 where part_k=2390
and clust_k='Test Ck Value' if writetime(count)=1462974475292000;
SyntaxException: 

Best Regards,
Bhuvan Rawal


Short column names

2016-02-11 Thread Bhuvan Rawal
Hi,

We are modelling schema for database revamp from mysql to Cassandra. It has
been recommended in several places that column names must be kept as small
as possible to optimise disk storage.

I have a doubt here, why can't we map column names and store it as an
index, say in memory. I mean, make column name really small human
unreadable and store it in disk  but map it with real column while
querying. That way one can go ahead with readable column names .

Let me know if I can go ahead and create a jira for the same

Regards,
Bhuvan