Re: Increased Cassandra connection latency

2014-05-29 Thread Aaron Morton
You’ll need to provide some more information such as: 

* Do you have monitoring on the cassandra cluster that shows the request 
latency ? Data Stax OpsCentre is  good starting point. 

* Is compaction keeping up ? Check with nodetool compactionstats

* Is the GCInspector logging about long running ParNew ? (it only logs when 
it’s longer than 200ms)

Cheers
Aaron

  
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 23/05/2014, at 10:35 pm, Alexey Sverdelov alexey.sverde...@googlemail.com 
wrote:

 Hi all,
 
 I've noticed increased latency on our tomcat REST-service (average 30ms, max 
  2sec). We are using Cassandra 1.2.16 with official DataStax Java driver 
 v1.0.3. 
 
 Our setup:
 
 * 2 DCs
 * each DC: 7 nodes
 * RF=5
 * Leveled compaction
 
 After cassandra restart on all nodes, the latencies are alright again 
 (average  5ms, max 50ms).
 
 Any thoughts are greatly appreciated.
 
 Thanks,
 Alexey



Re: What % of cassandra developers are employed by Datastax?

2014-05-29 Thread Aaron Morton
 The Cassandra Summit Bootcamp, Sep 12-13, immediately following the Summit, 
 might be interesting for potential contributors.
I’ll be there to help people get started. Looking forward to it.

While DS are the biggest contributor in time and patches, there are several 
other well known people and companies contributing and committing. 

IMHO level of community activity and support over the last 5ish years has been 
and will continue to be critical to the success of Cassandra, both Apache and 
DSE. Which is a polite way of saying there is *always* something an individual 
can do to contribute to the health of the project.

Cheers
Aaron 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/05/2014, at 7:28 am, Michael Shuler mich...@pbandjelly.org wrote:

 On 05/23/2014 01:23 PM, Peter Lin wrote:
 A separate but important consideration is long term health of a project.
 Many apache projects face this issue. When a project doesn't continually
 grow the contributors and committers, the project runs into issues in
 the long term. All open source projects see this, contributors and
 committers eventually leave, so it's important to continue to invite
 worthy contributors to become committers.
 
 The Cassandra Summit Bootcamp, Sep 12-13, immediately following the Summit, 
 might be interesting for potential contributors.
 
 -- 
 Michael



Re: Memory issue

2014-05-29 Thread Aaron Morton
   As soon as it starts, the JVM is get killed because of memory issue.
What is the memory issue that gets kills the JVM ? 

The log message below is simply a warning

 WARN [main] 2011-06-15 09:58:56,861 CLibrary.java (line 118) Unable to lock 
 JVM memory (ENOMEM).
 This can result in part of the JVM being swapped out, especially with mmapped 
 I/O enabled.
 Increase RLIMIT_MEMLOCK or run Cassandra as root.

Is there anything in the system logs ? 

Cheers
Aaron 
-
Aaron Morton
New Zealand
@aaronmorton

Co-Founder  Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 24/05/2014, at 9:17 am, Robert Coli rc...@eventbrite.com wrote:

 On Fri, May 23, 2014 at 2:08 PM, opensaf dev opensaf...@gmail.com wrote:
 I have a different service which controls the cassandra service for high 
 availability.
 
 IMO, starting or stopping a Cassandra node should never be a side effect of 
 another system's properties. YMMV.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2356
 
 For some related comments.
 
 =Rob
 



Re: What are the advantages of static column family over a dynamic column family?

2014-05-29 Thread Jens Rantil
Hi user 01 (firstname and lastname?),

I'll give you one technical answer and one related to modelling:

Technical: Sure, you could really put all your data on a single row. The
problem is it will simply not scale horizontally. More cassandra nodes will
not make your cluster perform better and will not give you more resources
in terms of disk or RAM.

Modelling: A single row might only have static columns. To have a self
documenting schema you might want to make your columns for that particular
reason.

Cheers,
Jens

On Wed, May 28, 2014 at 2:34 PM, user 01 user...@gmail.com wrote:

 What are the advantages of static column family over a dynamic column
 family? Otherwise why shouldn't I just make all my column families just
 dynamic for the reasons of ease?


 Do static column families save diskspace or offer better reads/writes
 performance ?



[RELEASE] Apache Cassandra 2.0.8 released

2014-05-29 Thread Sylvain Lebresne
The Cassandra team is pleased to announce the release of Apache Cassandra
version 2.0.8.

Cassandra is a highly scalable second-generation distributed database,
bringing together Dynamo's fully distributed design and Bigtable's
ColumnFamily-based data model. You can read more here:

 http://cassandra.apache.org/

Downloads of source and binary distributions are listed in our download
section:

 http://cassandra.apache.org/download/

This version is a bug fix release[1] on the 2.0 series. As always, please
pay
attention to the release notes[2] and Let us know[3] if you were to
encounter
any problem.

Enjoy!

[1]: http://goo.gl/Z2XJWN (CHANGES.txt)
[2]: http://goo.gl/JYEB2D (NEWS.txt)
[3]: https://issues.apache.org/jira/browse/CASSANDRA


With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
With Hector I used to create a column object once  add that to multiple
row mutations but with Astyanax it creates a new column object for each row
mutation in a mutation batch so if I need to add a same column to 1000
rows, it creates the column object 100 times. Isn't there a better way to
add same column to multiple rows more efficiently ? Probably I should be
able to create a column object once  be able to add that to multiple row
mutations via Astyanax.

I have several scenario in my app where I am doing this sort of mutations.


Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
I am using Astyanax over thrift driver.


On Thu, May 29, 2014 at 7:35 PM, user 01 user...@gmail.com wrote:

 With Hector I used to create a column object once  add that to multiple
 row mutations but with Astyanax it creates a new column object for each row
 mutation in a mutation batch so if I need to add a same column to 1000
 rows, it creates the column object 100 times. Isn't there a better way to
 add same column to multiple rows more efficiently ? Probably I should be
 able to create a column object once  be able to add that to multiple row
 mutations via Astyanax.

 I have several scenario in my app where I am doing this sort of mutations.



Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread DuyHai Doan
so if I need to add a same column to 1000 rows, it creates the column
object 100 times -- is it really an issue ? Even if Astyanax creates 1
millions of column objects, as long as they die young and respect the
generational hypothesis of the JVM, it's fine.


On Thu, May 29, 2014 at 4:05 PM, user 01 user...@gmail.com wrote:

 With Hector I used to create a column object once  add that to multiple
 row mutations but with Astyanax it creates a new column object for each row
 mutation in a mutation batch so if I need to add a same column to 1000
 rows, it creates the column object 100 times. Isn't there a better way to
 add same column to multiple rows more efficiently ? Probably I should be
 able to create a column object once  be able to add that to multiple row
 mutations via Astyanax.

 I have several scenario in my app where I am doing this sort of mutations.



Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread user 01
But won't it be nice if the API just provides a method to do so more
efficiently since it is easily possible? This is not a big deal for API.


On Thu, May 29, 2014 at 9:11 PM, DuyHai Doan doanduy...@gmail.com wrote:

 so if I need to add a same column to 1000 rows, it creates the column
 object 100 times -- is it really an issue ? Even if Astyanax creates 1
 millions of column objects, as long as they die young and respect the
 generational hypothesis of the JVM, it's fine.


 On Thu, May 29, 2014 at 4:05 PM, user 01 user...@gmail.com wrote:

 With Hector I used to create a column object once  add that to multiple
 row mutations but with Astyanax it creates a new column object for each row
 mutation in a mutation batch so if I need to add a same column to 1000
 rows, it creates the column object 100 times. Isn't there a better way to
 add same column to multiple rows more efficiently ? Probably I should be
 able to create a column object once  be able to add that to multiple row
 mutations via Astyanax.

 I have several scenario in my app where I am doing this sort of
 mutations.





Re: With Astyanax, How do I write same column to multiple rows efficiently ?

2014-05-29 Thread DuyHai Doan
Sure it can be done, you can submit them a pull request. I'm sure they'll
be happy to merge it.


On Thu, May 29, 2014 at 5:59 PM, user 01 user...@gmail.com wrote:

 But won't it be nice if the API just provides a method to do so more
 efficiently since it is easily possible? This is not a big deal for API.


 On Thu, May 29, 2014 at 9:11 PM, DuyHai Doan doanduy...@gmail.com wrote:

 so if I need to add a same column to 1000 rows, it creates the column
 object 100 times -- is it really an issue ? Even if Astyanax creates 1
 millions of column objects, as long as they die young and respect the
 generational hypothesis of the JVM, it's fine.


 On Thu, May 29, 2014 at 4:05 PM, user 01 user...@gmail.com wrote:

 With Hector I used to create a column object once  add that to multiple
 row mutations but with Astyanax it creates a new column object for each row
 mutation in a mutation batch so if I need to add a same column to 1000
 rows, it creates the column object 100 times. Isn't there a better way to
 add same column to multiple rows more efficiently ? Probably I should be
 able to create a column object once  be able to add that to multiple row
 mutations via Astyanax.

 I have several scenario in my app where I am doing this sort of
 mutations.






Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Robert Coli
On Sat, May 17, 2014 at 10:25 PM, Kevin Burton bur...@spinn3r.com wrote:

 compression … sure.. but bmdiff? Not that I can find.  BMDiff is an
 algorithm that in some situations could result in 10x compression due
 to the way it's able to find long commons runs.  This is a pathological
 case though.  But if you were to copy the US constitution into itself
 … 10x… bmdiff could ideally get a 10x compression rate.

 not all compression algorithms are identical.


The compression classes are pluggable. Exploratory patches are always
welcome! :D

Not sure I understand why you consider Byte Ordered Partitioner relevant,
isn't what matters for compressibility generally the uniformity of data
within rows in the SSTable, not the uniformity of their row keys?

=Rob


Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Robert Coli
On Fri, May 16, 2014 at 10:53 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm struggling with cassandra secondary indexes since the documentation
 seems all over the place and I'm having to put together everything from
 blog posts.


This mostly-complete summary content will eventually make it into a blog
post :


Secondary Indexes in Cassandra
--

Users frequently come into #cassandra or the cassandra-user@ mailing list
and ask questions about Secondary Indexes. Here is my stock answer.

“Unless you REALLY NEED the feature of atomic update of the secondary index
with the underlying row, you are almost always better off just making your
own manual secondary index column family.”

In Cassandra, the unit of distribution is the partition (f/k/a “Row”). If
your query needs to scan multiple partitions and inspect each of their
contents, you have probably made a mistake in your data model. For queries
which interact with sets of partitions one should use executeAsync() w/ the
new CQL drivers, not multigets.

Advantages of Secondary Indexes :

- Atomic update of secondary index with underlying partition/storage row.
- Don’t have to be maintained manually, including automated rebuild.
- Provides the illusion that you are using a RDBMS.

Disadvantages of Secondary Indexes :

- Before 1.2, they do a read-before-write.
https://issues.apache.org/jira/browse/CASSANDRA-2897
- A steady trickle of occasionally-serious bugs which do not affect the
normal read/write path. [3]
- Bad for low cardinality cases. FIXME : detail (relates to checking each
node)
- Bad for high cardinality cases. FIXME : detail (certain cases? what about
equality/non-equality?)
- CFstats not exposed via nodetool cfstats before 1.2 :
https://issues.apache.org/jira/browse/CASSANDRA-4464 ?
- Lower availability than normal Cassandra read path. FIXME : citation
- Unsorted results, in token order and not query value order.
- Can only search on datatypes Cassandra understands.
- Secondary index is located in the same directory as the primary SSTables.
- Provides the illusion that you are using a RDBMS.


Readers will note that I am not very clear above on which cardinality cases
they *are* good for, because I consider all the other problems sufficient
to never use them.

=Rob
[1] Citations :

https://issues.apache.org/jira/browse/CASSANDRA-5502

https://issues.apache.org/jira/browse/CASSANDRA-5975

https://issues.apache.org/jira/browse/CASSANDRA-2897 - 2i without
read-before-write

https://issues.apache.org/jira/browse/CASSANDRA-1571 - (0.7) Secondary
Indexes aren't updated when removing whole row

https://issues.apache.org/jira/browse/CASSANDRA-1747 - (0.7) Truncate is
not secondary index aware

https://issues.apache.org/jira/browse/CASSANDRA-1813 - (0.7) return
invalidrequest when client attempts to create secondary index on
supercolumns

https://issues.apache.org/jira/browse/CASSANDRA-2619 - (0.8) secondary
index not dropped until restart

https://issues.apache.org/jira/browse/CASSANDRA-2628 - (0.8) Empty Result
with Secondary Index Queries with limit 1

https://issues.apache.org/jira/browse/CASSANDRA-3057 - (0.8) secondary
index on a column that has a value of size  64k will fail on flush

https://issues.apache.org/jira/browse/CASSANDRA-3540 - (1.0) Wrong check of
partitioner for secondary indexes

https://issues.apache.org/jira/browse/CASSANDRA-3545 - (1.1) Fix very low
Secondary Index performance

https://issues.apache.org/jira/browse/CASSANDRA-4257 - (1.1) CQL3 range
query with secondary index fails

https://issues.apache.org/jira/browse/CASSANDRA-2897 - (1.2) Secondary
indexes without read-before-write

https://issues.apache.org/jira/browse/CASSANDRA-4289 - (1.2) Secondary
Indexes fail following a system restart

https://issues.apache.org/jira/browse/CASSANDRA-4785 - (1.2) Secondary
Index Sporadically Doesn't Return Rows

https://issues.apache.org/jira/browse/CASSANDRA-4973 - (1.1) Secondary
Index stops returning rows when caching=ALL

https://issues.apache.org/jira/browse/CASSANDRA-5079 - (1.1, but since
0.8) Compaction
deletes ExpiringColumns in Secondary Indexes

https://issues.apache.org/jira/browse/CASSANDRA-5732 - (1.2/2.0) Can not
query secondary index

https://issues.apache.org/jira/browse/CASSANDRA-5540 - (1.2) Concurrent
secondary index updates remove rows from the index

https://issues.apache.org/jira/browse/CASSANDRA-5599 - (1.2)
Intermittently, CQL SELECT  with WHERE on secondary indexed field value
returns null when there are rows

https://issues.apache.org/jira/browse/CASSANDRA-5397 - (1.2) Updates to
PerRowSecondaryIndex don't use most current values

https://issues.apache.org/jira/browse/CASSANDRA-5161 - (1.2) Slow secondary
index performance when using VNodes

https://issues.apache.org/jira/browse/CASSANDRA-5851 - (2.0) Fix 2i on
composite components omissions

https://issues.apache.org/jira/browse/CASSANDRA-5614 - (2.0) W/O specified
columns ASPCSI does not get notified of deletes


Retrieve counter value after update

2014-05-29 Thread ziju feng
Hi All,

I was wondering if there is a planned feature in Cassandra to return the
current counter value after the update statement?

Our project is using counter column to count and since counter column
cannot reside in the same table with regular columns, we have to
denormalize the counter value as integer into other tables that need to
display the value.

Our current way of denormalization is to read the current value and
writetime from the counter table after the update and then batch update
other tables with the value and timestamp (to resolve wrtie conflict).

I don't know if this is a common requirement but I think if update to
counter table can return the current value and timestamp (or counter column
can reside in regular table in the first place), we can save this extra
read, which can reduce cluster load and update latency.

Thanks,

Ziju


Re: Clustering order and secondary index

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 7:12 AM, cbert...@libero.it cbert...@libero.it
wrote:

 I have an easy question for you all: query using only secondary indexes do
 not
 respect any clustering order?


It is a general property of secondary indexes in Cassandra that they are
not in token order unless you are using an ordered partitioner.

=Rob


Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread DuyHai Doan
Hello Robert

 There are some maths involved when considering the performance of
secondary index in C*

 First, the current implementation is a distributed 2nd index, meaning that
each node that contains actual data also contains the index data.

 So considering a cluster of *N* nodes with replication factor *R*, to
fetch just the index data you'll need to do *N/R* reads. I'm not
considering the query with LIMIT clause.

 Once you get the index data, you'll need to fetch the actual data
related to this index. If your query returns *p* partitions, the complexity
would be O(N/R+p)

 Now, for very high cardinality secondary index (index on user email to
search user for instance), for 1 index data you only find one actual user
so the complexity is O(N/R) for  reading. If your cluster is big (N = 100
nodes) there will be a lot of wastefull reads...

 Because of its distributed nature, finding a *good* use-case for 2nd index
is quite tricky, partly because it  depends on the query pattern but also
on the cluster size and data distribution.

  Apart from the performance aspect, secondary index column families use
SizeTiered compaction so for an use case with a lot of update you'll have
plenty of tombstones... I'm not sure how end user can switch to Leveled
Compaction for 2nd index...

 Regards







On Thu, May 29, 2014 at 9:43 PM, Robert Coli rc...@eventbrite.com wrote:

 On Fri, May 16, 2014 at 10:53 AM, Kevin Burton bur...@spinn3r.com wrote:

 I'm struggling with cassandra secondary indexes since the documentation
 seems all over the place and I'm having to put together everything from
 blog posts.


 This mostly-complete summary content will eventually make it into a blog
 post :

 
 Secondary Indexes in Cassandra
 --

 Users frequently come into #cassandra or the cassandra-user@ mailing list
 and ask questions about Secondary Indexes. Here is my stock answer.

 “Unless you REALLY NEED the feature of atomic update of the secondary
 index with the underlying row, you are almost always better off just making
 your own manual secondary index column family.”

 In Cassandra, the unit of distribution is the partition (f/k/a “Row”). If
 your query needs to scan multiple partitions and inspect each of their
 contents, you have probably made a mistake in your data model. For queries
 which interact with sets of partitions one should use executeAsync() w/ the
 new CQL drivers, not multigets.

 Advantages of Secondary Indexes :

 - Atomic update of secondary index with underlying partition/storage row.
 - Don’t have to be maintained manually, including automated rebuild.
 - Provides the illusion that you are using a RDBMS.

 Disadvantages of Secondary Indexes :

 - Before 1.2, they do a read-before-write.
 https://issues.apache.org/jira/browse/CASSANDRA-2897
 - A steady trickle of occasionally-serious bugs which do not affect the
 normal read/write path. [3]
 - Bad for low cardinality cases. FIXME : detail (relates to checking each
 node)
 - Bad for high cardinality cases. FIXME : detail (certain cases? what
 about equality/non-equality?)
 - CFstats not exposed via nodetool cfstats before 1.2 :
 https://issues.apache.org/jira/browse/CASSANDRA-4464 ?
 - Lower availability than normal Cassandra read path. FIXME : citation
 - Unsorted results, in token order and not query value order.
 - Can only search on datatypes Cassandra understands.
 - Secondary index is located in the same directory as the primary
 SSTables.
 - Provides the illusion that you are using a RDBMS.
 

 Readers will note that I am not very clear above on which cardinality
 cases they *are* good for, because I consider all the other problems
 sufficient to never use them.

 =Rob
 [1] Citations :

 https://issues.apache.org/jira/browse/CASSANDRA-5502

 https://issues.apache.org/jira/browse/CASSANDRA-5975

 https://issues.apache.org/jira/browse/CASSANDRA-2897 - 2i without
 read-before-write

 https://issues.apache.org/jira/browse/CASSANDRA-1571 - (0.7) Secondary
 Indexes aren't updated when removing whole row

 https://issues.apache.org/jira/browse/CASSANDRA-1747 - (0.7) Truncate is
 not secondary index aware

 https://issues.apache.org/jira/browse/CASSANDRA-1813 - (0.7) return
 invalidrequest when client attempts to create secondary index on
 supercolumns

 https://issues.apache.org/jira/browse/CASSANDRA-2619 - (0.8) secondary
 index not dropped until restart

 https://issues.apache.org/jira/browse/CASSANDRA-2628 - (0.8) Empty Result
 with Secondary Index Queries with limit 1

 https://issues.apache.org/jira/browse/CASSANDRA-3057 - (0.8) secondary
 index on a column that has a value of size  64k will fail on flush

 https://issues.apache.org/jira/browse/CASSANDRA-3540 - (1.0) Wrong check
 of partitioner for secondary indexes

 https://issues.apache.org/jira/browse/CASSANDRA-3545 - (1.1) Fix very low
 Secondary Index performance

 https://issues.apache.org/jira/browse/CASSANDRA-4257 - (1.1) CQL3 range
 query with 

Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread user 01
What version of Astyanax(thrift based impl. or beta Java driver one?) are
you using ?
With what cassandra version ?

Would you still recommend Astyanax at this point when DS Java Driver is
out?
My intentions are to use Astyanax over thrift based impl for now,  later
switch to Astyanax over Java driver(when it gets stable) rather than java
driver directly, as I have now got more used to thrift  Astyanax can help
me write my queries programatically instead of CQL statements,  take care
of preparing queries or sanitizing them, etc. So Would you still recommend
Astyanax ?


Re: Retrieve counter value after update

2014-05-29 Thread DuyHai Doan
Hello Ziju

 First, you can read this excellent blog post explaining how counters work
under the hood:
http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters

 Now, considering your request, you'd like Cassandra to return the current
counter value on update. It would require reading local shard (already done
in 2.1) and remote shards upon update to return the current counter
value. It sound like distributed read-after-write and I'm not sure it's a
good idea performance-wise...

Regards


On Thu, May 29, 2014 at 9:47 PM, ziju feng pkdog...@gmail.com wrote:

 Hi All,

 I was wondering if there is a planned feature in Cassandra to return the
 current counter value after the update statement?

 Our project is using counter column to count and since counter column
 cannot reside in the same table with regular columns, we have to
 denormalize the counter value as integer into other tables that need to
 display the value.

 Our current way of denormalization is to read the current value and
 writetime from the counter table after the update and then batch update
 other tables with the value and timestamp (to resolve wrtie conflict).

 I don't know if this is a common requirement but I think if update to
 counter table can return the current value and timestamp (or counter column
 can reside in regular table in the first place), we can save this extra
 read, which can reduce cluster load and update latency.

 Thanks,

 Ziju



Re: binary protocol server side sockets

2014-05-29 Thread Eric Plowe
Michael,

The ask is for letting keep alive be configurable for native transport,
with Socket.setKeepAlive. By default, SO_KEEPALIVE is false (
http://docs.oracle.com/javase/7/docs/api/java/net/StandardSocketOptions.html#SO_KEEPALIVE).


Regards,

Eric Plowe


On Wed, Apr 9, 2014 at 1:25 PM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 04/09/2014 11:39 AM, graham sanderson wrote:

 Thanks, but I would think that just sets keep alive from the client end;
 I’m talking about the server end… this is one of those issues where
 there is something (e.g. switch, firewall, VPN in between the client and
 the server) and we get left with orphaned established connections to the
 server when the client is gone.


 There would be no server setting for any service, not just c*, that would
 correct mis-configured connection-assassinating network gear between the
 client and server. Fix the gear to allow persistent connections.

 Digging through the various timeouts in c*.yaml didn't lead me to a simple
 answer for something tunable, but I think this may be more basic networking
 related. I believe it's up to the client to keep the connection open as Duy
 indicated. I don't think c* will arbitrarily sever connections - something
 that disconnects the client may happen. In that case, the TCP connection on
 the server should drop to TIME_WAIT. Is this what you are seeing in
 `netstat -a` on the server - a bunch of TIME_WAIT connections hanging
 around? Those should eventually be recycled, but that's tunable in the
 network stack, if they are being generated at a high rate.

 --
 Michael



Re: Erase old sstables to make room for new sstables

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 10:17 AM, Redmumba redmu...@gmail.com wrote:

 Is this possible to do safely?  The data in the oldest sstable is always
 guaranteed to be the oldest data, so that is not my concern--my main
 concern is whether or not we can even do this, and also how we can notify
 Cassandra that an sstable has been removed underneath it.

 tl;dr: Can I routinely remove the oldest sstable to free up disk space,
 without causing stability drops in Cassandra?


tl;dr : no.

There is no mechanism by which to inform a running Cassandra process that
you consider a SSTable to no longer be live. It would probably be pretty
trivial to add a JMX call which did this, but I presume the project would
not merge it. Especially because it would be marked live again if you
restarted, until/unless CASSANDRA-6756 [1] is resolved in some way.

There are also likely cases where brute force removing data in the oldest
sstable file (tombstones, for example) will lead to unexpected results
while querying or during compaction.

Generally, Cassandra wants to manage SSTables in the data directory. It
does not want you to do so while the server is running. If you delete a
SSTable which Cassandra has an open file handle to, it will not be deleted
until Cassandra no longer has an open file handle to it, which will only
occur at node shutdown or post-compaction.

You could always stop the node, remove the SSTable, and restart the node.
But you are almost certainly better off using the 2.0/2.1 era stuff for
cases like this which relies on TTL to drop SSTables on the floor when they
are entirely full of expired data. There's another recent thread which
discusses some of these features, I am not personally clear on exactly what
cases like yours they cover.

=Rob
[1] https://issues.apache.org/jira/browse/CASSANDRA-6756


Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Robert Coli
On Thu, May 29, 2014 at 1:08 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Robert

  There are some maths involved when considering the performance of
 secondary index in C*


Yes, these are the maths which are behind my FIXMEs in the original post. I
merely have not had time to explicitly describe them in the context of that
draft post.

Thank you for doing so! When I reference them in my eventual post, I will
be sure to credit you.


  Because of its distributed nature, finding a *good* use-case for 2nd
 index is quite tricky, partly because it  depends on the query pattern but
 also on the cluster size and data distribution.


Yep, and if you're doing this tricky thing, you probably want less opacity
and more explicit understanding of what is happening under the hood and you
want to be sure you won't run into a bug in the implementation, hence
manual secondary index CFs.


   Apart from the performance aspect, secondary index column families use
 SizeTiered compaction so for an use case with a lot of update you'll have
 plenty of tombstones... I'm not sure how end user can switch to Leveled
 Compaction for 2nd index...


Per Aleksey, secondary index column families actually use the compaction
strategy of the column family they index. I agree that this seems weird,
and is likely just another implementation detail you relinquish control of
for the convenience of 2i.

=Rob


Re: How does cassandra page through low cardinality indexes?

2014-05-29 Thread Paulo Ricardo Motta Gomes
Really informative thread, thank you!

We had a secondary index trauma a while ago, and since then we knew it was
not a good idea for most of the cases, but now it's even more clear why.


On Thu, May 29, 2014 at 5:31 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, May 29, 2014 at 1:08 PM, DuyHai Doan doanduy...@gmail.com wrote:

 Hello Robert

  There are some maths involved when considering the performance of
 secondary index in C*


 Yes, these are the maths which are behind my FIXMEs in the original post.
 I merely have not had time to explicitly describe them in the context of
 that draft post.

 Thank you for doing so! When I reference them in my eventual post, I will
 be sure to credit you.


  Because of its distributed nature, finding a *good* use-case for 2nd
 index is quite tricky, partly because it  depends on the query pattern but
 also on the cluster size and data distribution.


 Yep, and if you're doing this tricky thing, you probably want less opacity
 and more explicit understanding of what is happening under the hood and you
 want to be sure you won't run into a bug in the implementation, hence
 manual secondary index CFs.


   Apart from the performance aspect, secondary index column families use
 SizeTiered compaction so for an use case with a lot of update you'll have
 plenty of tombstones... I'm not sure how end user can switch to Leveled
 Compaction for 2nd index...


 Per Aleksey, secondary index column families actually use the compaction
 strategy of the column family they index. I agree that this seems weird,
 and is likely just another implementation detail you relinquish control of
 for the convenience of 2i.

 =Rob




-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread Jacob Rhoden
Not long ago a vote was organised to get the developers to agree to stop work 
on the thrift API. New Cassandra features from this point are intended only for 
CQL. You probably want to make the effort to switch to CQL now rather than 
later.

__
Sent from iPhone

 On 30 May 2014, at 6:12 am, user 01 user...@gmail.com wrote:
 
 What version of Astyanax(thrift based impl. or beta Java driver one?) are you 
 using ?
 With what cassandra version ?
 
 Would you still recommend Astyanax at this point when DS Java Driver is out? 
 My intentions are to use Astyanax over thrift based impl for now,  later 
 switch to Astyanax over Java driver(when it gets stable) rather than java 
 driver directly, as I have now got more used to thrift  Astyanax can help me 
 write my queries programatically instead of CQL statements,  take care of 
 preparing queries or sanitizing them, etc. So Would you still recommend 
 Astyanax ?
 


Re: Anyone using Astyanax in production besides Netflix itself?

2014-05-29 Thread Tupshin Harper
While Astyanax 2.0 is still beta,  I think you will find it provides a very
good migration path from the 1.0 thrift based version to the 2.0 native
driver version.  Well worth considering if you like the Astyanax API and
functionality.  I know of multiple DataStax customers planning on using it.

-Tupshin
On May 29, 2014 4:12 PM, user 01 user...@gmail.com wrote:

 What version of Astyanax(thrift based impl. or beta Java driver one?) are
 you using ?
 With what cassandra version ?

 Would you still recommend Astyanax at this point when DS Java Driver is
 out?
 My intentions are to use Astyanax over thrift based impl for now,  later
 switch to Astyanax over Java driver(when it gets stable) rather than java
 driver directly, as I have now got more used to thrift  Astyanax can help
 me write my queries programatically instead of CQL statements,  take care
 of preparing queries or sanitizing them, etc. So Would you still recommend
 Astyanax ?




Multi-DC Environment Question

2014-05-29 Thread Vasileios Vlachos

Hello All,

We have plans to add a second DC to our live Cassandra environment. 
Currently RF=3 and we read and write at QUORUM. After adding DC2 we are 
going to be reading and writing at LOCAL_QUORUM.


If my understanding is correct, when a client sends a write request, if 
the consistency level is satisfied on DC1 (that is RF/2+1), success is 
returned to the client and DC2 will eventually get the data as well. The 
assumption behind this is that the the client always connects to DC1 for 
reads and writes and given that there is a site-to-site VPN between DC1 
and DC2. Therefore, DC1 will almost always return success before DC2 
(actually I don't know if it is possible for DC2 to be more up-to-date 
than DC1 with this setup...).


Now imagine DC1 looses connectivity and the client fails over to DC2. 
Everything should work fine after that, with the only difference that 
DC2 will be now handling the requests directly from the client. After 
some time, say after max_hint_window_in_ms, DC1 comes back up. My 
question is how do I bring DC1 up to speed with DC2 which is now more 
up-to-date? Will that require a nodetool repair on DC1 nodes? Also, what 
is the answer when the outage is  max_hint_window_in_msinstead?


Thanks in advance!

Vasilis

--
Kind Regards,

Vasileios Vlachos



Re: Multi-DC Environment Question

2014-05-29 Thread Tupshin Harper
When one node or DC is down, coordinator nodes being written through will
notice this fact and store hints (hinted handoff is the mechanism),  and
those hints are used to send the data that was not able to be replicated
initially.

http://www.datastax.com/dev/blog/modern-hinted-handoff

-Tupshin
On May 29, 2014 6:22 PM, Vasileios Vlachos vasileiosvlac...@gmail.com
wrote:

 Hello All,

We have plans to add a second DC to our live Cassandra environment.
Currently RF=3 and we read and write at QUORUM. After adding DC2 we are
going to be reading and writing at LOCAL_QUORUM.

If my understanding is correct, when a client sends a write request, if the
consistency level is satisfied on DC1 (that is RF/2+1), success is returned
to the client and DC2 will eventually get the data as well. The assumption
behind this is that the the client always connects to DC1 for reads and
writes and given that there is a site-to-site VPN between DC1 and DC2.
Therefore, DC1 will almost always return success before DC2 (actually I
don't know if it is possible for DC2 to be more up-to-date than DC1 with
this setup...).

Now imagine DC1 looses connectivity and the client fails over to DC2.
Everything should work fine after that, with the only difference that DC2
will be now handling the requests directly from the client. After some
time, say after max_hint_window_in_ms, DC1 comes back up. My question is
how do I bring DC1 up to speed with DC2 which is now more up-to-date? Will
that require a nodetool repair on DC1 nodes? Also, what is the answer when
the outage is  max_hint_window_in_ms instead?

Thanks in advance!

Vasilis

-- 
Kind Regards,

Vasileios Vlachos


Re: How long are expired values actually returned?

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 8:26 AM, Sebastian Schmidt isib...@gmail.com
wrote:

 Thank you for your answer, I really appreciate that you want to help me.
 But already found out that I did something wrong in my implementation.


Could you be more specific about the nature of the mistake you made, so
people who might have a similar symptom might benefit in google searches of
this list?

=Rob


Re: Number of rows under one partition key

2014-05-29 Thread Robert Coli
On Thu, May 15, 2014 at 6:10 AM, Vegard Berget p...@fantasista.no wrote:

 I know this has been discussed before, and I know there are limitations to
 how many rows one partition key in practice can handle.  But I am not sure
 if number of rows or total data is the deciding factor.


Both. In terms of data size, partitions containing over a small number of
hundreds of Megabytes begin to see diminishing returns in some cases.
Partitions over 64 megabytes are compacted on disk, which should give you a
rough sense of what Cassandra considers a large partition.


 Should we add another partition key to avoid 1 000 000 rows in the same
 thrift-row (which is how I understand it is actually stored)?  Or is 1 000
 000 rows okay?


Depending on row size and access patterns, 1Mn rows is not extremely large.
There are, however, some row sizes and operations where this order of
magnitude of columns might be slow.


 Other considerations, for example compaction strategy and if we should do
 an upgrade to 2.0 because of this (we will upgrade anyway, but if it is
 recommended we will continue to use 2.0 in development and upgrade the
 production environment sooner)


You should not upgrade to 2.0 in order to address this concern. You should
upgrade to 2.0 when it is stable enough to run in production, which IMO is
not yet. YMMV.


 I have done some testing, inserting a million rows and selecting them all,
 counting them and selecting individual rows (with both clientid and id) and
 it seems fine, but I want to ask to be sure that I am on the right track.


If the access patterns you are using perform the way you would like with
representative size data, sounds reasonable to me?

If you are able to select all million rows within a reasonable percentage
of the relevant timeout, I presume they cannot be too huge in terms of data
size! :D

=Rob


Re: conditional delete consistency level/timeout

2014-05-29 Thread Robert Coli
On Fri, May 16, 2014 at 7:06 AM, Mohica Jasha mohica.ja...@gmail.com
wrote:

 Earlier I reported the following bug against C* 2.0.5

...

 It seems to be fixed in C* 2.0.7, but we are still seeing similar
 suspicious timeouts.

...

 We noticed that DELETE queries against this table sometimes timeout:

   ...

 We set LOCAL_SERIAL and LOCAL_QUORUM as serial consistency level and
 consistency level in the query option passed to datastax Cluster.Builder.
 In my understanding the above query should be executed in LOCAL_SERIAL
 consistency level, I wonder why the exception says it failed to run the
 query in the LOCAL_QUORUM consistency level?


I swear I have seen a similar relevant bug in JIRA recently... did you
ultimately file one?

=Rob


Re: Number of rows under one partition key

2014-05-29 Thread Paulo Ricardo Motta Gomes
Hey,

We are considering upgrading from 1.2 to 2.0, why don't you consider 2.0
ready for production yet, Robert? Have you wrote about this somewhere
already?

A bit off-topic in this discussion but it would be interesting to know,
your posts are generally very enlightening.

Cheers,


On Thu, May 29, 2014 at 8:51 PM, Robert Coli rc...@eventbrite.com wrote:

 On Thu, May 15, 2014 at 6:10 AM, Vegard Berget p...@fantasista.no wrote:

 I know this has been discussed before, and I know there are limitations
 to how many rows one partition key in practice can handle.  But I am not
 sure if number of rows or total data is the deciding factor.


 Both. In terms of data size, partitions containing over a small number of
 hundreds of Megabytes begin to see diminishing returns in some cases.
 Partitions over 64 megabytes are compacted on disk, which should give you a
 rough sense of what Cassandra considers a large partition.


 Should we add another partition key to avoid 1 000 000 rows in the same
 thrift-row (which is how I understand it is actually stored)?  Or is 1 000
 000 rows okay?


 Depending on row size and access patterns, 1Mn rows is not extremely
 large. There are, however, some row sizes and operations where this order
 of magnitude of columns might be slow.


 Other considerations, for example compaction strategy and if we should do
 an upgrade to 2.0 because of this (we will upgrade anyway, but if it is
 recommended we will continue to use 2.0 in development and upgrade the
 production environment sooner)


 You should not upgrade to 2.0 in order to address this concern. You should
 upgrade to 2.0 when it is stable enough to run in production, which IMO is
 not yet. YMMV.


 I have done some testing, inserting a million rows and selecting them
 all, counting them and selecting individual rows (with both clientid and
 id) and it seems fine, but I want to ask to be sure that I am on the right
 track.


 If the access patterns you are using perform the way you would like with
 representative size data, sounds reasonable to me?

 If you are able to select all million rows within a reasonable percentage
 of the relevant timeout, I presume they cannot be too huge in terms of data
 size! :D

 =Rob




-- 
*Paulo Motta*

Chaordic | *Platform*
*www.chaordic.com.br http://www.chaordic.com.br/*
+55 48 3232.3200


Re: Increased Cassandra connection latency

2014-05-29 Thread Alex Popescu
Also using the latest version of the driver (1.0.7) is always a good idea
just to make sure you are not hitting issues that have already been
addressed.


On Thu, May 29, 2014 at 12:33 AM, Aaron Morton aa...@thelastpickle.com
wrote:

 You’ll need to provide some more information such as:

 * Do you have monitoring on the cassandra cluster that shows the request
 latency ? Data Stax OpsCentre is  good starting point.

 * Is compaction keeping up ? Check with nodetool compactionstats

 * Is the GCInspector logging about long running ParNew ? (it only logs
 when it’s longer than 200ms)

 Cheers
 Aaron


 -
 Aaron Morton
 New Zealand
 @aaronmorton

 Co-Founder  Principal Consultant
 Apache Cassandra Consulting
 http://www.thelastpickle.com

 On 23/05/2014, at 10:35 pm, Alexey Sverdelov 
 alexey.sverde...@googlemail.com wrote:

 Hi all,

 I've noticed increased latency on our tomcat REST-service (average 30ms,
 max  2sec). We are using Cassandra 1.2.16 with official DataStax Java
 driver v1.0.3.

 Our setup:

 * 2 DCs
 * each DC: 7 nodes
 * RF=5
 * Leveled compaction

 After cassandra restart on all nodes, the latencies are alright again
 (average  5ms, max 50ms).

 Any thoughts are greatly appreciated.

 Thanks,
 Alexey





-- 

:- a)


Alex Popescu
Sen. Product Manager @ DataStax
@al3xandru


Re: Multi-DC Environment Question

2014-05-29 Thread Ben Bromhead
Short answer:

If time elapsed  max_hint_window_in_ms then hints will stop being created. You 
will need to rely on your read consistency level, read repair and anti-entropy 
repair operations to restore consistency.

Long answer:

http://www.slideshare.net/jasedbrown/understanding-antientropy-in-cassandra

Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359

On 30 May 2014, at 8:40 am, Tupshin Harper tups...@tupshin.com wrote:

 When one node or DC is down, coordinator nodes being written through will 
 notice this fact and store hints (hinted handoff is the mechanism),  and 
 those hints are used to send the data that was not able to be replicated 
 initially.
 
 http://www.datastax.com/dev/blog/modern-hinted-handoff
 
 -Tupshin
 
 On May 29, 2014 6:22 PM, Vasileios Vlachos vasileiosvlac...@gmail.com 
 wrote:
 Hello All,
 
 We have plans to add a second DC to our live Cassandra environment. Currently 
 RF=3 and we read and write at QUORUM. After adding DC2 we are going to be 
 reading and writing at LOCAL_QUORUM.
 
 If my understanding is correct, when a client sends a write request, if the 
 consistency level is satisfied on DC1 (that is RF/2+1), success is returned 
 to the client and DC2 will eventually get the data as well. The assumption 
 behind this is that the the client always connects to DC1 for reads and 
 writes and given that there is a site-to-site VPN between DC1 and DC2. 
 Therefore, DC1 will almost always return success before DC2 (actually I don't 
 know if it is possible for DC2 to be more up-to-date than DC1 with this 
 setup...).
 
 Now imagine DC1 looses connectivity and the client fails over to DC2. 
 Everything should work fine after that, with the only difference that DC2 
 will be now handling the requests directly from the client. After some time, 
 say after max_hint_window_in_ms, DC1 comes back up. My question is how do I 
 bring DC1 up to speed with DC2 which is now more up-to-date? Will that 
 require a nodetool repair on DC1 nodes? Also, what is the answer when the 
 outage is  max_hint_window_in_ms instead?
 
 Thanks in advance!
 
 Vasilis
 -- 
 Kind Regards,
 
 Vasileios Vlachos



Re: vcdiff/bmdiff , cassandra , and the ordered partitioner…

2014-05-29 Thread Kevin Burton
The general idea is that for HTML content, you want content from the same
domain to be adjacent on disk.  This way duplicate HTML template runs get
compressed REALLY well.

I think in our situations we would see exceptional compression.

If we get closer to this I'll just implement snappy+bmdiff...


On Thu, May 29, 2014 at 12:34 PM, Robert Coli rc...@eventbrite.com wrote:

 On Sat, May 17, 2014 at 10:25 PM, Kevin Burton bur...@spinn3r.com wrote:

 compression … sure.. but bmdiff? Not that I can find.  BMDiff is an
 algorithm that in some situations could result in 10x compression due
 to the way it's able to find long commons runs.  This is a pathological
 case though.  But if you were to copy the US constitution into itself
 … 10x… bmdiff could ideally get a 10x compression rate.

 not all compression algorithms are identical.


 The compression classes are pluggable. Exploratory patches are always
 welcome! :D

 Not sure I understand why you consider Byte Ordered Partitioner relevant,
 isn't what matters for compressibility generally the uniformity of data
 within rows in the SSTable, not the uniformity of their row keys?

 =Rob




-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.


I don't understand paging through a table by primary key.

2014-05-29 Thread Kevin Burton
I'm trying to grok this but I can't figure it out in CQL world.

I'd like to efficiently page through a table via primary key.

This way I only involve one node at a time and the reads on disk are
contiguous.

I would have assumed it was a combination of  pk and order by but that
doesn't seem to work.

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
https://plus.google.com/102718274791889610666/posts
http://spinn3r.com
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.