Re: unusual GC log

2015-10-20 Thread Graham Sanderson
What version of C* are you running? any special settings in cassandra.yaml; are 
you running with stock GC settings in cassandra-env.sh? what JDK/OS?

> On Oct 19, 2015, at 11:40 PM, 曹志富  wrote:
> 
> INFO  [Service Thread] 2015-10-20 10:42:47,854 GCInspector.java:252 - ParNew 
> GC in 476ms.  CMS Old Gen: 4288526240 -> 4725514832; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:42:50,870 GCInspector.java:252 - ParNew 
> GC in 423ms.  CMS Old Gen: 4725514832 -> 5114687560; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:42:53,847 GCInspector.java:252 - ParNew 
> GC in 406ms.  CMS Old Gen: 5114688368 -> 5513119264; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:42:57,118 GCInspector.java:252 - ParNew 
> GC in 421ms.  CMS Old Gen: 5513119264 -> 5926324736; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:00,041 GCInspector.java:252 - ParNew 
> GC in 437ms.  CMS Old Gen: 5926324736 -> 6324793584; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:03,029 GCInspector.java:252 - ParNew 
> GC in 429ms.  CMS Old Gen: 6324793584 -> 6693672608; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:05,566 GCInspector.java:252 - ParNew 
> GC in 339ms.  CMS Old Gen: 6693672608 -> 6989128592; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:08,431 GCInspector.java:252 - ParNew 
> GC in 421ms.  CMS Old Gen: 6266493464 -> 6662041272; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:11,131 GCInspector.java:252 - 
> ConcurrentMarkSweep GC in 215ms.  CMS Old Gen: 5926324736 -> 4574418480; CMS 
> Perm Gen: 33751256 -> 33751192
> ; Par Eden Space: 7192 -> 611360336; 
> INFO  [Service Thread] 2015-10-20 10:43:11,848 GCInspector.java:252 - ParNew 
> GC in 511ms.  CMS Old Gen: 4574418480 -> 4996166672; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:14,915 GCInspector.java:252 - ParNew 
> GC in 395ms.  CMS Old Gen: 4996167912 -> 5380926744; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:18,335 GCInspector.java:252 - ParNew 
> GC in 432ms.  CMS Old Gen: 5380926744 -> 5811659120; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:21,492 GCInspector.java:252 - ParNew 
> GC in 439ms.  CMS Old Gen: 5811659120 -> 6270861936; Par Eden Space: 
> 671088640 -> 0; 
> INFO  [Service Thread] 2015-10-20 10:43:24,698 GCInspector.java:252 - ParNew 
> GC in 490ms.  CMS Old Gen: 6270861936 -> 6668734208; Par Eden Space: 
> 671088640 -> 0; Par Survivor Sp
> ace: 83886080 -> 83886072
> INFO  [Service Thread] 2015-10-20 10:43:27,963 GCInspector.java:252 - ParNew 
> GC in 457ms.  CMS Old Gen: 6668734208 -> 7072885208; Par Eden Space: 
> 671088640 -> 0; Par Survivor Sp
> ace: 83886072 -> 83886080
> 
> after seconds node mark down.
> 
> My node config is : 8GB heap NEW_HEAP size is 800MB
> 
> NODE hardware is :4CORE 32GBRAM
> 
> --
> Ranger Tsao



smime.p7s
Description: S/MIME cryptographic signature


Write timeout on other nodes when joing a new node (in new DC)

2015-10-20 Thread Jiri Horky
Hi all,

we are experiencing a strange behavior when we are trying to bootstrap a
new node. The problem is that the Recent Write Latency goes to 2s on all
the other Cassandra nodes (which are receiving user traffic), which
corresponds to our setting of "write_request_timeout_in_ms: 2000".

We use Cassandra 2.0.10 and trying to convert to vnodes and increase a
replication factor. So we are adding a new node in new DC (marked as
DCXA) as the only node in new DC with replication factor 3. The reason
for higher RF is that we will be converting another 2 existing servers
to new DC (vnodes) and we want them to get all the data.

The replication settings look like this:
ALTER KEYSPACE slw WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'DC4': '1',
  'DC5': '1',
  'DC2': '1',
  'DC3': '1',
  'DC0': '1',
  'DC1': '1',
  'DC0A': '3',
  'DC1A': '3',
  'DC2A': '3',
  'DC3A': '3',
  'DC4A': '3',
  'DC5A': '3'
};

We were adding the nodes to DC0A->DC4A without any effects on existing
nodes (DCX without A). When we are trying to add DC5A, the abovemention
problem happens, 100% reproducibly.

I tried to increase number of concurrent_writers from 32 to 128 on the
old nodes, also tried to increase number of flush writers, both  with no
effect. The strange thing is that the load, CPU usage, GC, network
throughput - everything is fine on the old nodes which are reporting 2s
of write latency. Nodetool tpstats does not show any blocked/pending
operations.

I think I must be hitting some limit (because of overall of replicas?)
somewhere.

Any input would be greatly appreciated.

Thanks
Jirka H.



Hiper-V snapshot and Cassandra

2015-10-20 Thread Raul D'Opazo
Hi,
I am really new with Cassandra and i have some questions regarding the backup 
of Cassandra with TB of info. So please, forgive me if I ask a noob question.
I only have one node, in one server (Windows 2012), and Cassandra will grow up 
to 4TB approx. It is a hiper-v virtual machine, with enough resources.
I have done snapshots and it is ok, because we don't double the size in each 
snapshot, but I need to have other solution in case of disks problems.
Copying these snapshots using other backup systems is crazy, approx.. 500MB/s 
it will last days.
I am thinking if hiper-v virtual machine snapshots can be used to recover 
Cassandra in a consistence way. Is it possible?
This will avoid me to copy snapshots to other network location or backup system.
Thanks,
Raul



Re: Write timeout on other nodes when joing a new node (in new DC)

2015-10-20 Thread Jiri Horky
Hi all,

so after deep investigation, we found out that this is this problem

https://issues.apache.org/jira/browse/CASSANDRA-8058

Jiri Horky

On 10/20/2015 12:00 PM, Jiri Horky wrote:
> Hi all,
>
> we are experiencing a strange behavior when we are trying to bootstrap a
> new node. The problem is that the Recent Write Latency goes to 2s on all
> the other Cassandra nodes (which are receiving user traffic), which
> corresponds to our setting of "write_request_timeout_in_ms: 2000".
>
> We use Cassandra 2.0.10 and trying to convert to vnodes and increase a
> replication factor. So we are adding a new node in new DC (marked as
> DCXA) as the only node in new DC with replication factor 3. The reason
> for higher RF is that we will be converting another 2 existing servers
> to new DC (vnodes) and we want them to get all the data.
>
> The replication settings look like this:
> ALTER KEYSPACE slw WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'DC4': '1',
>   'DC5': '1',
>   'DC2': '1',
>   'DC3': '1',
>   'DC0': '1',
>   'DC1': '1',
>   'DC0A': '3',
>   'DC1A': '3',
>   'DC2A': '3',
>   'DC3A': '3',
>   'DC4A': '3',
>   'DC5A': '3'
> };
>
> We were adding the nodes to DC0A->DC4A without any effects on existing
> nodes (DCX without A). When we are trying to add DC5A, the abovemention
> problem happens, 100% reproducibly.
>
> I tried to increase number of concurrent_writers from 32 to 128 on the
> old nodes, also tried to increase number of flush writers, both  with no
> effect. The strange thing is that the load, CPU usage, GC, network
> throughput - everything is fine on the old nodes which are reporting 2s
> of write latency. Nodetool tpstats does not show any blocked/pending
> operations.
>
> I think I must be hitting some limit (because of overall of replicas?)
> somewhere.
>
> Any input would be greatly appreciated.
>
> Thanks
> Jirka H.
>



Cassandra Object Mapper - Dynamically pass keyspace value

2015-10-20 Thread Ashish Soni
Hi All ,

is there any way i can specify value of keyspace during compile time like
using maven build
hard coding keyspace name inside the java class is bit not comfortable as
if there a change and there are 1000's of files it become a big maintenance
issue

@UDT (keyspace = "complex", name = "address")public class Address {
private String street;
private String city;
private int zipCode;


Re: Data visualization tools for Cassandra

2015-10-20 Thread Sebastian Estevez
For zeppelin check Duy Hai's branch:

https://github.com/doanduyhai/incubator-zeppelin/blob/Spark_Cassandra_Demo/README.md

All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Oct 20, 2015 at 1:28 PM, Mathieu Delsaut <
mathieu.dels...@univ-reunion.fr> wrote:

> Try apache zeppelin. It's a pretty young project but very useful.
> https://zeppelin.incubator.apache.org/
>
> Include a Cassandra and Spark connector among many others.
>
>
> Mathieu  Delsaut 
> *Research Engineer at LE²P*
> +262 (0)262 93 86 08
>  
> 
> 
>
> 2015-10-20 21:24 GMT+04:00 Jon Haddad :
>
>> PySpark (dataframes) + Pandas + Seaborn/Matplotlib
>>
>> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
>>
>> Take a look at jKool, a DataStax partner at jKoolCloud.com
>> .  It provides visualization for data in DSE.
>>
>> Regards,
>>
>> Charley
>>
>> *From:* Gene [mailto:gh5...@gmail.com ]
>> *Sent:* Tuesday, October 20, 2015 1:17 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Data visualization tools for Cassandra
>>
>> Have you looked at OpsCenter?
>>
>> On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone 
>> wrote:
>> Hi,
>> We are looking for data visualization tools to chart some graphs over
>> data present in our cassandra cluster. Are there any open source
>> visualization tools that people are using to quickly draw some charts over
>> data in their cassandra tables? We are using Datastax version of cassandra,
>> in case that is relevant.
>>
>>
>> thanks
>>
>> [image: Nastel Technologies] 
>>
>>
>>
>>
>> *The information contained in this e-mail and in any attachment is
>> confidential andis intended solely for the use of the individual or entity
>> to which it is addressed.Access, copying, disclosure or use of such
>> information by anyone else is unauthorized. If you are not the intended
>> recipient, please delete the e-mail and refrain from use of such
>> information.*
>>
>>
>>
>


Re: Read query taking a long time

2015-10-20 Thread Carlos Alonso
I think also having the output of cfhistograms could help. I'd like to know
how many sstables are being hit during reads.

Also, which CL are you reading with?

cfstats is a local command, so maybe that node you've printed is working
fine but there's another that is causing the latency. Can you check that
command in all nodes?

Regards

Carlos Alonso | Software Engineer | @calonso 

On 20 October 2015 at 13:59, Brice Figureau <
brice+cassan...@daysofwonder.com> wrote:

> Hi,
>
> Thanks for your answer. Unfortunately since I wrote my e-mail, things
> are a bit better.
>
> This might be because I moved from openjdk 7 to oracle jdk 8 after
> having seen a warning in the C* log about openjdk, and I also added a
> node (for other reasons).
>
> Now the query itself takes only 1.5s~2s instead of the 5s~6s it was
> taking before.
>
> On Mon, 2015-10-19 at 14:38 +0100, Carlos Alonso wrote:
> > Could you send cfhistograms and cfstats relevant to the read column
> > family?
>
> Here are the requested informatrion
> % nodetool proxyhistograms
> proxy histograms
> Percentile  Read Latency Write Latency Range Latency
> (micros)  (micros)  (micros)
> 50%  1109.00372.00   1916.00
> 75% 14237.00535.00   2759.00
> 95%105778.00642.00   4768.00
> 98%545791.00770.00  11864.00
> 99%785939.00924.00  14237.00
> Min73.00  0.00373.00
> Max   5839588.00  88148.00  73457.00
>
> % nodetool cfstats akka.messages
> Keyspace: akka
> Read Count: 3334784
> Read Latency: 9.98472696792356 ms.
> Write Count: 7124
> Write Latency: 0.572256457046603 ms.
> Pending Flushes: 0
> Table: messages
> SSTable count: 1
> Space used (live): 4680841
> Space used (total): 4680841
> Space used by snapshots (total): 23615746
> Off heap memory used (total): 4051
> SSTable Compression Ratio: 0.17318784636027024
> Number of keys (estimate): 478
> Memtable cell count: 317
> Memtable data size: 42293
> Memtable off heap memory used: 0
> Memtable switch count: 10
> Local read count: 3334784
> Local read latency: 9.985 ms
> Local write count: 7124
> Local write latency: 0.573 ms
> Pending flushes: 0
> Bloom filter false positives: 0
> Bloom filter false ratio: 0.0
> Bloom filter space used: 592
> Bloom filter off heap memory used: 584
> Index summary off heap memory used: 203
> Compression metadata off heap memory used: 3264
> Compacted partition minimum bytes: 73
> Compacted partition maximum bytes: 17436917
> Compacted partition mean bytes: 63810
> Average live cells per slice (last five minutes):
> 0.6693421039216356
> Maximum live cells per slice (last five minutes): 1033.0
> Average tombstones per slice (last five minutes): 0.0
> Maximum tombstones per slice (last five minutes): 0.0
>
> 
>
> If I read correctly this, there's a huge read latency while proxying,
> but local read latency, or even all node latency on this table is
> correct.
>
> Would that mean this is a network issue?
> --
> Brice Figureau 
>
>


Re: Data visualization tools for Cassandra

2015-10-20 Thread Mathieu Delsaut
Try apache zeppelin. It's a pretty young project but very useful.
https://zeppelin.incubator.apache.org/

Include a Cassandra and Spark connector among many others.


Mathieu  Delsaut 
*Research Engineer at LE²P*
+262 (0)262 93 86 08
 



2015-10-20 21:24 GMT+04:00 Jon Haddad :

> PySpark (dataframes) + Pandas + Seaborn/Matplotlib
>
> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
>
> Take a look at jKool, a DataStax partner at jKoolCloud.com
> .  It provides visualization for data in DSE.
>
> Regards,
>
> Charley
>
> *From:* Gene [mailto:gh5...@gmail.com ]
> *Sent:* Tuesday, October 20, 2015 1:17 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Data visualization tools for Cassandra
>
> Have you looked at OpsCenter?
>
> On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone  wrote:
> Hi,
> We are looking for data visualization tools to chart some graphs over data
> present in our cassandra cluster. Are there any open source visualization
> tools that people are using to quickly draw some charts over data in their
> cassandra tables? We are using Datastax version of cassandra, in case that
> is relevant.
>
>
> thanks
>
> [image: Nastel Technologies] 
>
>
>
>
> *The information contained in this e-mail and in any attachment is
> confidential andis intended solely for the use of the individual or entity
> to which it is addressed.Access, copying, disclosure or use of such
> information by anyone else is unauthorized. If you are not the intended
> recipient, please delete the e-mail and refrain from use of such
> information.*
>
>
>


Re: Data visualization tools for Cassandra

2015-10-20 Thread DuyHai Doan
For more info about Zeppelin, look at my recent presentation slides at
Apache Big Data Europe:
http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Zeppelin%20-%20The%20missing%20component%20for%20the%20BigData%20ecosystem.pdf

The most up-to-date branch to play with Zeppelin/Spark/Cassandra
integration is
https://github.com/doanduyhai/incubator-zeppelin/blob/MetroDay

On Tue, Oct 20, 2015 at 8:54 PM, Vikram Kone  wrote:

> Does opscenter has the capability to create custom dashboards based on the
> data in cassandra tables?
> I'll look into zeppelin and it's fork by Duy Hai
>
> thanks
>
> On Tue, Oct 20, 2015 at 10:31 AM, Sebastian Estevez <
> sebastian.este...@datastax.com> wrote:
>
>> For zeppelin check Duy Hai's branch:
>>
>>
>> https://github.com/doanduyhai/incubator-zeppelin/blob/Spark_Cassandra_Demo/README.md
>>
>> All the best,
>>
>>
>> [image: datastax_logo.png] 
>>
>> Sebastián Estévez
>>
>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>
>> [image: linkedin.png]  [image:
>> facebook.png]  [image: twitter.png]
>>  [image: g+.png]
>> 
>> 
>> 
>>
>>
>> 
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>> On Tue, Oct 20, 2015 at 1:28 PM, Mathieu Delsaut <
>> mathieu.dels...@univ-reunion.fr> wrote:
>>
>>> Try apache zeppelin. It's a pretty young project but very useful.
>>> https://zeppelin.incubator.apache.org/
>>>
>>> Include a Cassandra and Spark connector among many others.
>>>
>>>
>>> Mathieu  Delsaut 
>>> *Research Engineer at LE²P*
>>> +262 (0)262 93 86 08
>>>  
>>> 
>>> 
>>>
>>> 2015-10-20 21:24 GMT+04:00 Jon Haddad :
>>>
 PySpark (dataframes) + Pandas + Seaborn/Matplotlib

 On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:

 Take a look at jKool, a DataStax partner at jKoolCloud.com
 .  It provides visualization for data in DSE.

 Regards,

 Charley

 *From:* Gene [mailto:gh5...@gmail.com ]
 *Sent:* Tuesday, October 20, 2015 1:17 PM
 *To:* user@cassandra.apache.org
 *Subject:* Re: Data visualization tools for Cassandra

 Have you looked at OpsCenter?

 On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone 
 wrote:
 Hi,
 We are looking for data visualization tools to chart some graphs over
 data present in our cassandra cluster. Are there any open source
 visualization tools that people are using to quickly draw some charts over
 data in their cassandra tables? We are using Datastax version of cassandra,
 in case that is relevant.


 thanks

 [image: Nastel Technologies] 




 *The information contained in this e-mail and in any attachment is
 confidential andis intended solely for the use of the individual or entity
 to which it is addressed.Access, copying, disclosure or use of such
 information by anyone else is unauthorized. If you are not the intended
 recipient, please delete the e-mail and refrain from use of such
 information.*



>>>
>>
>


Re: Cassandra users survey

2015-10-20 Thread Jonathan Ellis
Thanks for all the responses!

The results (minus suggestions and emails) are available here:
https://docs.google.com/spreadsheets/d/1FegCArZgj2DNAjNkcXi1n2Y1Kfvf6cdZedkMPYQdvC0/edit?usp=sharing

I've included charts on separate sheets for each question, but
unfortunately I couldn't figure out how to help Google make sense of any of
the data where the form allowed multiple or free-form responses.

Some things that jump out at me:

- 3/4 of responses use only CQL.
- 3% have more than 1000 tables in the schema. On an absolute scale this is
low but still more than I expected.
- 60% are deployed across more than one datacenter
- I should have broken down the node count responses into more detail;
roughly 50% each in 1-10 and 10-100.  I should also include an "are you in
production?" question next time.
- More responses of both "less than 32 GB ram/node" and "128 GB or more"
than I expected.
- Including the "both" responses, a majority of users are deploying SSD now.

On Wed, Sep 30, 2015 at 1:18 PM, Jonathan Ellis  wrote:

> With 3.0 approaching, the Apache Cassandra team would appreciate your
> feedback as we work on the project roadmap for future releases.
>
> I've put together a brief survey here:
> https://docs.google.com/forms/d/1TEG0umQAmiH3RXjNYdzNrKoBCl1x7zurMroMzAFeG2Y/viewform?usp=send_form
>
> Please take a few minutes to fill it out!
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder, http://www.datastax.com
> @spyced
>
>


-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced


Re: Data visualization tools for Cassandra

2015-10-20 Thread Vikram Kone
Does opscenter has the capability to create custom dashboards based on the
data in cassandra tables?
I'll look into zeppelin and it's fork by Duy Hai

thanks

On Tue, Oct 20, 2015 at 10:31 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> For zeppelin check Duy Hai's branch:
>
>
> https://github.com/doanduyhai/incubator-zeppelin/blob/Spark_Cassandra_Demo/README.md
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Oct 20, 2015 at 1:28 PM, Mathieu Delsaut <
> mathieu.dels...@univ-reunion.fr> wrote:
>
>> Try apache zeppelin. It's a pretty young project but very useful.
>> https://zeppelin.incubator.apache.org/
>>
>> Include a Cassandra and Spark connector among many others.
>>
>>
>> Mathieu  Delsaut 
>> *Research Engineer at LE²P*
>> +262 (0)262 93 86 08
>>  
>> 
>> 
>>
>> 2015-10-20 21:24 GMT+04:00 Jon Haddad :
>>
>>> PySpark (dataframes) + Pandas + Seaborn/Matplotlib
>>>
>>> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
>>>
>>> Take a look at jKool, a DataStax partner at jKoolCloud.com
>>> .  It provides visualization for data in DSE.
>>>
>>> Regards,
>>>
>>> Charley
>>>
>>> *From:* Gene [mailto:gh5...@gmail.com ]
>>> *Sent:* Tuesday, October 20, 2015 1:17 PM
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Data visualization tools for Cassandra
>>>
>>> Have you looked at OpsCenter?
>>>
>>> On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone 
>>> wrote:
>>> Hi,
>>> We are looking for data visualization tools to chart some graphs over
>>> data present in our cassandra cluster. Are there any open source
>>> visualization tools that people are using to quickly draw some charts over
>>> data in their cassandra tables? We are using Datastax version of cassandra,
>>> in case that is relevant.
>>>
>>>
>>> thanks
>>>
>>> [image: Nastel Technologies] 
>>>
>>>
>>>
>>>
>>> *The information contained in this e-mail and in any attachment is
>>> confidential andis intended solely for the use of the individual or entity
>>> to which it is addressed.Access, copying, disclosure or use of such
>>> information by anyone else is unauthorized. If you are not the intended
>>> recipient, please delete the e-mail and refrain from use of such
>>> information.*
>>>
>>>
>>>
>>
>


Re: Data visualization tools for Cassandra

2015-10-20 Thread Jon Haddad
PySpark (dataframes) + Pandas + Seaborn/Matplotlib

> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
> 
> Take a look at jKool, a DataStax partner at jKoolCloud.com 
> .  It provides visualization for data in DSE.
>  
> Regards,
>  
> Charley
>  
> From: Gene [mailto:gh5...@gmail.com ] 
> Sent: Tuesday, October 20, 2015 1:17 PM
> To: user@cassandra.apache.org 
> Subject: Re: Data visualization tools for Cassandra
>  
> Have you looked at OpsCenter?
>  
> On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone  > wrote:
> Hi,
> We are looking for data visualization tools to chart some graphs over data 
> present in our cassandra cluster. Are there any open source visualization 
> tools that people are using to quickly draw some charts over data in their 
> cassandra tables? We are using Datastax version of cassandra, in case that is 
> relevant.
>  
>  
> thanks
>  
>  
> The information contained in this e-mail and in any attachment is 
> confidential and
> is intended solely for the use of the individual or entity to which it is 
> addressed.
> Access, copying, disclosure or use of such information by anyone else is 
> unauthorized. 
> If you are not the intended recipient, please delete the e-mail and refrain 
> from use of such information.
> 



RE: Data visualization tools for Cassandra

2015-10-20 Thread Charles Rich
Take a look at jKool, a DataStax partner at jKoolCloud.com.  It provides 
visualization for data in DSE.

Regards,

Charley

From: Gene [mailto:gh5...@gmail.com]
Sent: Tuesday, October 20, 2015 1:17 PM
To: user@cassandra.apache.org
Subject: Re: Data visualization tools for Cassandra

Have you looked at OpsCenter?

On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone 
> wrote:
Hi,
We are looking for data visualization tools to chart some graphs over data 
present in our cassandra cluster. Are there any open source visualization tools 
that people are using to quickly draw some charts over data in their cassandra 
tables? We are using Datastax version of cassandra, in case that is relevant.


thanks

[Nastel Technologies]

The information contained in this e-mail and in any attachment is confidential 
and
is intended solely for the use of the individual or entity to which it is 
addressed.
Access, copying, disclosure or use of such information by anyone else is 
unauthorized.
If you are not the intended recipient, please delete the e-mail and refrain 
from use of such information.


Re: Hiper-V snapshot and Cassandra

2015-10-20 Thread Jeff Jirsa
As long as your hyper-v/vss snapshots include both the data directory and the 
commit log directory, then they’re exactly as good as tolerating a single power 
outage – you should be able to load the sstables and replay  commit log and be 
fine. 

Assuming you’re moving the hyper-v/vss snapshot to another host (using DPM or 
similar), it’s probably going to work the way you expect.

You’ll note, however, the cassandra was designed to do the opposite of what 
you’re doing – rather than having one monolithic database that’s scaled up, the 
canonical use case for cassandra is to have a number of smaller databases, so 
you still get the same capacity and throughput, but you also get high 
availability and fault tolerance. It may be worth noting (as Mr. Coli 
suggested) that you’re using cassandra in an atypical fashion, and if you add 
more smaller nodes, then you’ll gain performance, gain HA, gain capacity, and 
that moving snapshots will be faster because there’s less data per system.


From:  Raul D'Opazo
Reply-To:  "user@cassandra.apache.org"
Date:  Tuesday, October 20, 2015 at 4:22 AM
To:  "user@cassandra.apache.org"
Subject:  Hiper-V snapshot and Cassandra

Hi,

I am really new with Cassandra and i have some questions regarding the backup 
of Cassandra with TB of info. So please, forgive me if I ask a noob question. 

I only have one node, in one server (Windows 2012), and Cassandra will grow up 
to 4TB approx. It is a hiper-v virtual machine, with enough resources.

I have done snapshots and it is ok, because we don’t double the size in each 
snapshot, but I need to have other solution in case of disks problems. 

Copying these snapshots using other backup systems is crazy, approx.. 500MB/s 
it will last days.

I am thinking if hiper-v virtual machine snapshots can be used to recover 
Cassandra in a consistence way. Is it possible?

This will avoid me to copy snapshots to other network location or backup system.

Thanks,

Raul  

 



smime.p7s
Description: S/MIME cryptographic signature


Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-20 Thread Branton Davis
Howdy Cassandra folks.

Crickets here and it's sort of unsettling that we're alone with this
issue.  Is it appropriate to create a JIRA issue for this or is there maybe
another way to deal with it?

Thanks!

On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis 
wrote:

> Hey all.
>
> We've been seeing this warning on one of our clusters:
>
> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
> org.apache.cassandra.db.context.CounterContext invalid global counter shard
> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
> pick highest to self-heal on compaction
>
>
> From what I've read and heard in the IRC channel, this warning could be
> related to not running upgradesstables after upgrading from 2.0.x to
> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
> November.  Looking back, the warnings start appearing around June, when no
> maintenance had been performed on the cluster.  At that time, we had been
> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
> (the upgrade was when we noticed this warning for the first time).
>
> From a suggestion in IRC, I went ahead and ran upgradesstables on all the
> nodes.  Our weekly repair also ran this morning.  But the warnings still
> show up throughout the day.
>
> So, we have many questions:
>
>- How much should we be freaking out?
>- Why is this recurring?  If I understand what's happening, this is a
>self-healing process.  So, why would it keep happening?  Are we possibly
>using counters incorrectly?
>- What does it even mean that there were multiple shards for the same
>counter?  How does that situation even occur?
>
> We're pretty lost here, so any help would be greatly appreciated.
>
> Thanks!
>


Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-20 Thread Sebastian Estevez
Hi Branton,


>- How much should we be freaking out?
>
> The impact of this is possible counter inaccuracy (over counting or under
counting). If you are expecting counters to be exactly accurate, you are
already in trouble because they are not. This is because of the fact that
they are not idempotent operations operating in a distributed system
(you've probably read Aleksey's

post by now).

>
>- Why is this recurring?  If I understand what's happening, this is a
>self-healing process.  So, why would it keep happening?  Are we possibly
>using counters incorrectly?
>
> Even after running sstableupgrade, your counter cells will not be upgraded
until they have all been incremented. You may still seeing the warning
happening on pre 2.1 counter cells which have not been incremented yet.

>
>- What does it even mean that there were multiple shards for the same
>counter?  How does that situation even occur?
>
> We used to maintain "counter shards" at the sstable level in pre 2.1
counters. This means that on compaction or reads we would essentially add
the shards together when getting the value or merging the cells. This
caused a series of problems including the warning you are still seeing.
TL;DR, we now store the final value of the counter (not the
increment/shard) at the commitlog level and beyond in post 2.1 counters, so
this is no longer an issue. Again, read Aleksey's post

.

Many users started fresh tables after upgrading to 2.1, update only the new
tables, and added application logic to decide what table to read from.
Something like monthly tables works well if you're doing time series
counters, and would ensure that you stop seeing the warnings on the
new/active tables and get the benefits of 2.1 counters quickly.




All the best,


[image: datastax_logo.png] 

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

[image: linkedin.png]  [image:
facebook.png]  [image: twitter.png]
 [image: g+.png]







DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis 
wrote:

> Howdy Cassandra folks.
>
> Crickets here and it's sort of unsettling that we're alone with this
> issue.  Is it appropriate to create a JIRA issue for this or is there maybe
> another way to deal with it?
>
> Thanks!
>
> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis  > wrote:
>
>> Hey all.
>>
>> We've been seeing this warning on one of our clusters:
>>
>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>> org.apache.cassandra.db.context.CounterContext invalid global counter shard
>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
>> pick highest to self-heal on compaction
>>
>>
>> From what I've read and heard in the IRC channel, this warning could be
>> related to not running upgradesstables after upgrading from 2.0.x to
>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>> November.  Looking back, the warnings start appearing around June, when no
>> maintenance had been performed on the cluster.  At that time, we had been
>> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
>> (the upgrade was when we noticed this warning for the first time).
>>
>> From a suggestion in IRC, I went ahead and ran upgradesstables on all the
>> nodes.  Our weekly repair also ran this morning.  But the warnings still
>> show up throughout the day.
>>
>> So, we have many questions:
>>
>>- How much should we be freaking out?
>>- Why is this recurring?  If I understand what's happening, this is a
>>self-healing process.  So, why would it keep happening?  Are we possibly
>>using counters incorrectly?
>>- What does it even mean that there were multiple shards for the same
>>counter?  How does that situation even occur?
>>
>> We're pretty lost here, so any help would be greatly appreciated.
>>
>> Thanks!
>>
>
>


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-20 Thread Branton Davis
On Mon, Oct 19, 2015 at 5:42 PM, Robert Coli  wrote:

> On Mon, Oct 19, 2015 at 9:20 AM, Branton Davis  > wrote:
>
>> Is that also true if you're standing up multiple nodes from backups that
>> already have data?  Could you not stand up more than one at a time since
>> they already have the data?
>>
>
> An operator probably almost never wants to add multiple
> not-previously-joined nodes to an active cluster via auto_bootstrap:false.
>
> The one case I can imagine is when you are starting a cluster which is not
> receiving any write traffic and does contain snapshots.
>
> =Rob
>

Just to clarify, I was thinking about a scenario/disaster where we lost the
entire cluster and had to rebuild from backups.  I assumed we would start
each node with the backed up data and commit log directories already there
and with auto_bootstrap=false, and I also hoped that we could do all nodes
at once, since they each already had their data.  Is that wrong?  If so,
how would you handle such a situation?


Data visualization tools for Cassandra

2015-10-20 Thread Vikram Kone
Hi,
We are looking for data visualization tools to chart some graphs over data
present in our cassandra cluster. Are there any open source visualization
tools that people are using to quickly draw some charts over data in their
cassandra tables? We are using Datastax version of cassandra, in case that
is relevant.


thanks


Re: Data visualization tools for Cassandra

2015-10-20 Thread Gene
Have you looked at OpsCenter?

On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone  wrote:

> Hi,
> We are looking for data visualization tools to chart some graphs over data
> present in our cassandra cluster. Are there any open source visualization
> tools that people are using to quickly draw some charts over data in their
> cassandra tables? We are using Datastax version of cassandra, in case that
> is relevant.
>
>
> thanks
>


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-20 Thread Robert Coli
On Tue, Oct 20, 2015 at 9:13 AM, Branton Davis 
wrote:

>
>> Just to clarify, I was thinking about a scenario/disaster where we lost
> the entire cluster and had to rebuild from backups.  I assumed we would
> start each node with the backed up data and commit log directories already
> there and with auto_bootstrap=false, and I also hoped that we could do all
> nodes at once, since they each already had their data.  Is that wrong?  If
> so, how would you handle such a situation?
>

"The one case I can imagine is when you are starting a cluster which is not
receiving any write traffic and does contain snapshots. "

The case you describe is in that class of cases.

=Rob



>
>


Re: Would we have data corruption if we bootstrapped 10 nodes at once?

2015-10-20 Thread Branton Davis
On Tue, Oct 20, 2015 at 3:31 PM, Robert Coli  wrote:

> On Tue, Oct 20, 2015 at 9:13 AM, Branton Davis  > wrote:
>
>>
>>> Just to clarify, I was thinking about a scenario/disaster where we lost
>> the entire cluster and had to rebuild from backups.  I assumed we would
>> start each node with the backed up data and commit log directories already
>> there and with auto_bootstrap=false, and I also hoped that we could do all
>> nodes at once, since they each already had their data.  Is that wrong?  If
>> so, how would you handle such a situation?
>>
>
> "The one case I can imagine is when you are starting a cluster which is
> not receiving any write traffic and does contain snapshots. "
>
> The case you describe is in that class of cases.
>
> =Rob
>
>
>
>>
>>
>
>
Thanks for confirming!


Re: Changing nodes ips

2015-10-20 Thread Cyril Scetbon
No idea on the subject ? Is it a current limitation of Cassandra 2.1 ? 
> On Oct 18, 2015, at 21:58, Cyril Scetbon  wrote:
> 
> Hi,
> 
> I want to change the ip addresses (ip v4) that nodes use to discuss (gossip). 
> I'm trying to migrate from ipv4 to ipv6 for those communications. I tried to 
> follow a similar procedure as the one used at CASSANDRA-8382 
> .
>  However it doesn't work as expected. When I do the change on the first node, 
> nodes seems to not see each other, if I trust nodetool :
> 
> on first node :
> 
> Datacenter: s1
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address Load   Tokens  OwnsHost ID
>Rack
> DN  10.10.12.19?  256 ?   
> dab24e23-4b42-438e-9070-7994e329e868  i10
> UN  2a01:c940:a5:2005:0:0:0:18  244.35 MB  256 ?   
> 03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10
> 
> on second node :
> 
> Datacenter: s1
> ==
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address   Load   Tokens  Owns (effective)  Host ID
>Rack
> DN  10.10.12.18  244.24 MB  256 100.0%
> 03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10
> UN  10.10.12.19  244.11 MB  256 100.0%
> dab24e23-4b42-438e-9070-7994e329e868  i10
> 
> I can see in the first node logs that it tries to handshake with node 2, 
> however I can't see neither error in node 1 logs nor information in node 2 
> logs.
> 
> Of course, I'm trying to find a procedure that does not cause any downtime of 
> the whole cluster.
> 
> Any idea ?  
>  -- 
> Cyril SCETBON
> 



[ANNOUNCEMENT] Support for Cassandra added in DB Solo

2015-10-20 Thread Marko Hantula
Hi Everyone,

We are pleased to announce support for Cassandra in DB Solo. Some of the
supported features include

-Browse database structures with a couple of mouse clicks
-Write ad-hoc queries in the Query Editor
-Multiple simultaneous database connections
-Schema scripting tool
-Data export/import

Feedback and comments are welcome, you can find the tool at www.dbsolo.com

Cheers,
Marko



Re: Is there any configuration so that local program on C* node can connect using localhost and remote program using IP/name?

2015-10-20 Thread Tyler Hobbs
On Mon, Oct 19, 2015 at 7:35 PM, Ravi  wrote:

>
> I am using apache-cassandra-2.2.0.


You should upgrade to 2.2.3.  There were some bugs that you probably want
to avoid in 2.2.0.


>
> Is there any configuration so that local program on C* node can connect
> using localhost as connection url and remote program's using IP/name in
> connection url?


Set rpc_address to 0.0.0.0 to bind all interfaces.


-- 
Tyler Hobbs
DataStax 


Re: Data visualization tools for Cassandra

2015-10-20 Thread DuyHai Doan
"From the readme
documentation
on github , it looks like I need to install spark cluster locally on the
zeppelin node. Is that true?"

--> It depends on your needs. For production I strongly advise to install
zeppelin node in a separated server to avoid impacting Spark performance.

For a demo or a POC, you can have everything on the same machine (Spark,
Cassandra, Zeppelin). For my talk demos, I don't even bother to run a Spark
process, I use the Spark in local mode for testing

"Can I install zeppelin on a different cluster and connect it to spark and
cassandra on the remote cluster (via JMX auth) ?"

--> I suppose you meant "DC" rather than "cluster" right ?  You can install
Zeppelin in a separated machine and configure it to connect to the Spark
master node IP. Of course if the Zeppelin server is installed on DC1 and
you connect it to Spark master in DC2, you'll suffer from latency. The
recommendation is to install Zeppelin in a server as close as possible
(network-wise) to where you Spark master is




On Tue, Oct 20, 2015 at 9:38 PM, Vikram Kone  wrote:

> ​​
> Thanks for the link DuyHai
> From the readme
> documentation
> on github , it looks like I need to install spark cluster locally on the
> zeppelin node. Is that true?
> We already have DSE cassandra cluster deployed in 2 DCs with spark enabled
> on  all of the cassandra nodes (via DSE configuration).
> Can I install zeppelin on a different cluster and connect it to spark and
> cassandra on the remote cluster (via JMX auth) ?
>
> On Tue, Oct 20, 2015 at 12:10 PM, DuyHai Doan 
> wrote:
>
>> For more info about Zeppelin, look at my recent presentation slides at
>> Apache Big Data Europe:
>> http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Zeppelin%20-%20The%20missing%20component%20for%20the%20BigData%20ecosystem.pdf
>>
>> The most up-to-date branch to play with Zeppelin/Spark/Cassandra
>> integration is
>> https://github.com/doanduyhai/incubator-zeppelin/blob/MetroDay
>>
>> On Tue, Oct 20, 2015 at 8:54 PM, Vikram Kone 
>> wrote:
>>
>>> Does opscenter has the capability to create custom dashboards based on
>>> the data in cassandra tables?
>>> I'll look into zeppelin and it's fork by Duy Hai
>>>
>>> thanks
>>>
>>> On Tue, Oct 20, 2015 at 10:31 AM, Sebastian Estevez <
>>> sebastian.este...@datastax.com> wrote:
>>>
 For zeppelin check Duy Hai's branch:


 https://github.com/doanduyhai/incubator-zeppelin/blob/Spark_Cassandra_Demo/README.md

 All the best,


 [image: datastax_logo.png] 

 Sebastián Estévez

 Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com

 [image: linkedin.png]  [image:
 facebook.png]  [image: twitter.png]
  [image: g+.png]
 
 
 


 

 DataStax is the fastest, most scalable distributed database
 technology, delivering Apache Cassandra to the world’s most innovative
 enterprises. Datastax is built to be agile, always-on, and predictably
 scalable to any size. With more than 500 customers in 45 countries, 
 DataStax
 is the database technology and transactional backbone of choice for the
 worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.

 On Tue, Oct 20, 2015 at 1:28 PM, Mathieu Delsaut <
 mathieu.dels...@univ-reunion.fr> wrote:

> Try apache zeppelin. It's a pretty young project but very useful.
> https://zeppelin.incubator.apache.org/
>
> Include a Cassandra and Spark connector among many others.
>
>
> Mathieu  Delsaut 
> *Research Engineer at LE²P*
> +262 (0)262 93 86 08
>  
> 
> 
>
> 2015-10-20 21:24 GMT+04:00 Jon Haddad :
>
>> PySpark (dataframes) + Pandas + Seaborn/Matplotlib
>>
>> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
>>
>> Take a look at jKool, a DataStax partner at jKoolCloud.com
>> .  It provides visualization for data in DSE.
>>
>> Regards,
>>
>> Charley
>>
>> *From:* Gene [mailto:gh5...@gmail.com ]
>> *Sent:* Tuesday, October 20, 2015 1:17 PM
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Data visualization tools for Cassandra
>>
>> Have 

Re: Changing nodes ips

2015-10-20 Thread Robert Coli
On Tue, Oct 20, 2015 at 1:13 PM, Cyril Scetbon 
wrote:

> No idea on the subject ? Is it a current limitation of Cassandra 2.1 ?
>

Probably a relatively unusual activity to attempt; not sure if it's
"supposed" to work or not.

I would probably :

1) file a ticket at issues.apache.org
2) let the list know the URL of the issue you created

=Rob


Re: Data visualization tools for Cassandra

2015-10-20 Thread Vikram Kone
​​
Thanks for the link DuyHai
>From the readme
documentation
on github , it looks like I need to install spark cluster locally on the
zeppelin node. Is that true?
We already have DSE cassandra cluster deployed in 2 DCs with spark enabled
on  all of the cassandra nodes (via DSE configuration).
Can I install zeppelin on a different cluster and connect it to spark and
cassandra on the remote cluster (via JMX auth) ?

On Tue, Oct 20, 2015 at 12:10 PM, DuyHai Doan  wrote:

> For more info about Zeppelin, look at my recent presentation slides at
> Apache Big Data Europe:
> http://events.linuxfoundation.org/sites/events/files/slides/Apache%20Zeppelin%20-%20The%20missing%20component%20for%20the%20BigData%20ecosystem.pdf
>
> The most up-to-date branch to play with Zeppelin/Spark/Cassandra
> integration is
> https://github.com/doanduyhai/incubator-zeppelin/blob/MetroDay
>
> On Tue, Oct 20, 2015 at 8:54 PM, Vikram Kone  wrote:
>
>> Does opscenter has the capability to create custom dashboards based on
>> the data in cassandra tables?
>> I'll look into zeppelin and it's fork by Duy Hai
>>
>> thanks
>>
>> On Tue, Oct 20, 2015 at 10:31 AM, Sebastian Estevez <
>> sebastian.este...@datastax.com> wrote:
>>
>>> For zeppelin check Duy Hai's branch:
>>>
>>>
>>> https://github.com/doanduyhai/incubator-zeppelin/blob/Spark_Cassandra_Demo/README.md
>>>
>>> All the best,
>>>
>>>
>>> [image: datastax_logo.png] 
>>>
>>> Sebastián Estévez
>>>
>>> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>>>
>>> [image: linkedin.png]  [image:
>>> facebook.png]  [image: twitter.png]
>>>  [image: g+.png]
>>> 
>>> 
>>> 
>>>
>>>
>>> 
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>> On Tue, Oct 20, 2015 at 1:28 PM, Mathieu Delsaut <
>>> mathieu.dels...@univ-reunion.fr> wrote:
>>>
 Try apache zeppelin. It's a pretty young project but very useful.
 https://zeppelin.incubator.apache.org/

 Include a Cassandra and Spark connector among many others.


 Mathieu  Delsaut 
 *Research Engineer at LE²P*
 +262 (0)262 93 86 08
  
 
 

 2015-10-20 21:24 GMT+04:00 Jon Haddad :

> PySpark (dataframes) + Pandas + Seaborn/Matplotlib
>
> On Oct 20, 2015, at 11:22 AM, Charles Rich  wrote:
>
> Take a look at jKool, a DataStax partner at jKoolCloud.com
> .  It provides visualization for data in DSE.
>
> Regards,
>
> Charley
>
> *From:* Gene [mailto:gh5...@gmail.com ]
> *Sent:* Tuesday, October 20, 2015 1:17 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Data visualization tools for Cassandra
>
> Have you looked at OpsCenter?
>
> On Tue, Oct 20, 2015 at 9:59 AM, Vikram Kone 
> wrote:
> Hi,
> We are looking for data visualization tools to chart some graphs over
> data present in our cassandra cluster. Are there any open source
> visualization tools that people are using to quickly draw some charts over
> data in their cassandra tables? We are using Datastax version of 
> cassandra,
> in case that is relevant.
>
>
> thanks
>
> [image: Nastel Technologies] 
>
>
>
>
> *The information contained in this e-mail and in any attachment is
> confidential andis intended solely for the use of the individual or entity
> to which it is addressed.Access, copying, disclosure or use of such
> information by anyone else is unauthorized. If you are not the intended
> recipient, please delete the e-mail and refrain from use of such
> information.*
>
>
>

>>>
>>
>


Re: Changing nodes ips

2015-10-20 Thread Michael Shuler

On 10/20/2015 03:13 PM, Cyril Scetbon wrote:

I'm trying to migrate from ipv4 to ipv6 for those
communications. I tried to follow a similar procedure as the one used
at CASSANDRA-8382
.
When I do the change on the first
node, nodes seems to not see each other, if I trust nodetool :


As mentioned in the linked JIRA, I'm quite certain you will need to do 
all IPv4 -> IPv6 address changes on all nodes at the same time for the 
entire cluster - not one at a time.



_on first node :_

Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address Load   Tokens  OwnsHost ID
  Rack
*DN  10.10.12.19*?  256 ?
dab24e23-4b42-438e-9070-7994e329e868  i10
*UN* *2a01:c940:a5:2005:0:0:0:18*  244.35 MB  256 ?
03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10


This is likely due to the fact that an IPv6 address will not be really 
able to route directly to an IPv4 address.



_on second node :_
_
_
Datacenter: s1
==
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load   Tokens  Owns (effective)  Host ID
  Rack
*DN  10.10.12.18*  244.24 MB  256 100.0%
 03c558ec-add9-4dcd-bf2b-a1b28575e06b  c10
*UN  10.10.12.19*  244.11 MB  256 100.0%
 dab24e23-4b42-438e-9070-7994e329e868  i10


Because the second node is not aware of the first, since there is no 
communication established.



I can see in the first node logs that it tries to handshake with node
2, however I can't see neither error in node 1 logs nor information in
node 2 logs.

Of course, I'm trying to find a procedure that does not cause any
downtime of the whole cluster.


Doubtful that this is fully possible. It is very similar to the IPv4 
example in the linked JIRA - the 10. and 192.168. networks cannot speak 
to one another, so at least some downtime is to be expected here. Are 
the connecting clients already dual-homed?


That said, there *may* be the possibility to use dual-homed IPv4/v6 
interfaces and using listen_interface and isten_interface_prefer_ipv6: 
true(?) From conf/cassandra.yaml


# If you choose to specify the interface by name and the interface has 
an ipv4 and an ipv6 address
# you can specify which should be chosen using 
listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. 
Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of 
ipv4/ipv6.

listen_address: localhost
# listen_interface: eth0
# listen_interface_prefer_ipv6: false

I'd try out some different methods in a dev env - full listen_address: 
edits and cluster restart; test out listen_interface: and a rolling 
restart, perhaps. Practice your changes in dev, and worst case, you can 
minimize downtime, if it's really necessary. You could be working on a 
network migration no one else has really tried yet :)


--
Kind regards,
Michael


RE: Write timeout on other nodes when joing a new node (in new DC)

2015-10-20 Thread Chris Allen
UNSUBSCRIBE



Re: Hiper-V snapshot and Cassandra

2015-10-20 Thread Robert Coli
On Tue, Oct 20, 2015 at 4:22 AM, Raul D'Opazo  wrote:

> I only have one node, in one server (Windows 2012), and Cassandra will
> grow up to 4TB approx. It is a hiper-v virtual machine, with enough
> resources.
>
>
This is an extremely unusual and probably degenerate use of Cassandra.


> I have done snapshots and it is ok, because we don’t double the size in
> each snapshot, but I need to have other solution in case of disks problems.
>

I have no idea how snapshots work in Windows; if like linux, each snapshot
is hard links to the actual data files.


> I am thinking if hiper-v virtual machine snapshots can be used to recover
> Cassandra in a consistence way. Is it possible?
>
>
Sure? If you quiesce writes to the system or if you don't care about the
delta in the commit log between snapshot+hiper-v snapshot, your snapshot
will contain all the immutable data files you need to restore.

Finally, I re-iterate my confusion at why you wish to do this unusual thing?

=Rob


Re: unusual GC log

2015-10-20 Thread 曹志富
C* version is 2.1.6.
CentOS release 6.5 (Final)
Sun JDK 1.7.0_71 64bit.

attach is my config setting.

Thank you very much!!!

--
Ranger Tsao

2015-10-20 15:43 GMT+08:00 Graham Sanderson :

> What version of C* are you running? any special settings in
> cassandra.yaml; are you running with stock GC settings in cassandra-env.sh?
> what JDK/OS?
>
> On Oct 19, 2015, at 11:40 PM, 曹志富  wrote:
>
> INFO  [Service Thread] 2015-10-20 10:42:47,854 GCInspector.java:252 -
> ParNew GC in 476ms.  CMS Old Gen: 4288526240 -> 4725514832; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:42:50,870 GCInspector.java:252 -
> ParNew GC in 423ms.  CMS Old Gen: 4725514832 -> 5114687560; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:42:53,847 GCInspector.java:252 -
> ParNew GC in 406ms.  CMS Old Gen: 5114688368 -> 5513119264; Par Eden
> Space: 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:42:57,118 GCInspector.java:252 -
> ParNew GC in 421ms.  CMS Old Gen: 5513119264 -> 5926324736; Par Eden
> Space: 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:00,041 GCInspector.java:252 -
> ParNew GC in 437ms.  CMS Old Gen: 5926324736 -> 6324793584; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:03,029 GCInspector.java:252 -
> ParNew GC in 429ms.  CMS Old Gen: 6324793584 -> 6693672608; Par Eden
> Space: 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:05,566 GCInspector.java:252 -
> ParNew GC in 339ms.  CMS Old Gen: 6693672608 -> 6989128592; Par Eden
> Space: 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:08,431 GCInspector.java:252 -
> ParNew GC in 421ms.  CMS Old Gen: 6266493464 -> 6662041272; Par Eden
> Space: 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:11,131 GCInspector.java:252 -
> ConcurrentMarkSweep GC in 215ms.  CMS Old Gen: 5926324736 -> 4574418480;
> CMS Perm Gen: 33751256 -> 33751192
> ; Par Eden Space: 7192 -> 611360336;
> INFO  [Service Thread] 2015-10-20 10:43:11,848 GCInspector.java:252 -
> ParNew GC in 511ms.  CMS Old Gen: 4574418480 -> 4996166672; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:14,915 GCInspector.java:252 -
> ParNew GC in 395ms.  CMS Old Gen: 4996167912 -> 5380926744; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:18,335 GCInspector.java:252 -
> ParNew GC in 432ms.  CMS Old Gen: 5380926744 -> 5811659120; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:21,492 GCInspector.java:252 -
> ParNew GC in 439ms.  CMS Old Gen: 5811659120 -> 6270861936; Par Eden Space:
> 671088640 -> 0;
> INFO  [Service Thread] 2015-10-20 10:43:24,698 GCInspector.java:252 -
> ParNew GC in 490ms.  CMS Old Gen: 6270861936 -> 6668734208; Par Eden Space:
> 671088640 -> 0; Par Survivor Sp
> ace: 83886080 -> 83886072
> INFO  [Service Thread] 2015-10-20 10:43:27,963 GCInspector.java:252 -
> ParNew GC in 457ms.  CMS Old Gen: 6668734208 -> 7072885208; Par Eden
> Space: 671088640 -> 0; Par Survivor Sp
> ace: 83886072 -> 83886080
>
> after seconds node mark down.
>
> My node config is : 8GB heap NEW_HEAP size is 800MB
>
> NODE hardware is :4CORE 32GBRAM
>
> --
> Ranger Tsao
>
>
>


cassandra.yaml
Description: Binary data


cassandra-env.sh
Description: Bourne shell script


Re: "invalid global counter shard detected" warning on 2.1.3 and 2.1.10

2015-10-20 Thread Branton Davis
Sebastián, thanks so much for the info!

On Tue, Oct 20, 2015 at 11:34 AM, Sebastian Estevez <
sebastian.este...@datastax.com> wrote:

> Hi Branton,
>
>
>>- How much should we be freaking out?
>>
>> The impact of this is possible counter inaccuracy (over counting or under
> counting). If you are expecting counters to be exactly accurate, you are
> already in trouble because they are not. This is because of the fact that
> they are not idempotent operations operating in a distributed system
> (you've probably read Aleksey's
> 
> post by now).
>
>>
>>- Why is this recurring?  If I understand what's happening, this is a
>>self-healing process.  So, why would it keep happening?  Are we possibly
>>using counters incorrectly?
>>
>> Even after running sstableupgrade, your counter cells will not be
> upgraded until they have all been incremented. You may still seeing the
> warning happening on pre 2.1 counter cells which have not been incremented
> yet.
>
>>
>>- What does it even mean that there were multiple shards for the same
>>counter?  How does that situation even occur?
>>
>> We used to maintain "counter shards" at the sstable level in pre 2.1
> counters. This means that on compaction or reads we would essentially add
> the shards together when getting the value or merging the cells. This
> caused a series of problems including the warning you are still seeing.
> TL;DR, we now store the final value of the counter (not the
> increment/shard) at the commitlog level and beyond in post 2.1 counters, so
> this is no longer an issue. Again, read Aleksey's post
> 
> .
>
> Many users started fresh tables after upgrading to 2.1, update only the
> new tables, and added application logic to decide what table to read from.
> Something like monthly tables works well if you're doing time series
> counters, and would ensure that you stop seeing the warnings on the
> new/active tables and get the benefits of 2.1 counters quickly.
>
>
>
>
> All the best,
>
>
> [image: datastax_logo.png] 
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com
>
> [image: linkedin.png]  [image:
> facebook.png]  [image: twitter.png]
>  [image: g+.png]
> 
> 
> 
>
>
> 
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Tue, Oct 20, 2015 at 12:21 PM, Branton Davis <
> branton.da...@spanning.com> wrote:
>
>> Howdy Cassandra folks.
>>
>> Crickets here and it's sort of unsettling that we're alone with this
>> issue.  Is it appropriate to create a JIRA issue for this or is there maybe
>> another way to deal with it?
>>
>> Thanks!
>>
>> On Sun, Oct 18, 2015 at 1:55 PM, Branton Davis <
>> branton.da...@spanning.com> wrote:
>>
>>> Hey all.
>>>
>>> We've been seeing this warning on one of our clusters:
>>>
>>> 2015-10-18 14:28:52,898 WARN  [ValidationExecutor:14]
>>> org.apache.cassandra.db.context.CounterContext invalid global counter shard
>>> detected; (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 67158) and
>>> (4aa69016-4cf8-4585-8f23-e59af050d174, 1, 21486) differ only in count; will
>>> pick highest to self-heal on compaction
>>>
>>>
>>> From what I've read and heard in the IRC channel, this warning could be
>>> related to not running upgradesstables after upgrading from 2.0.x to
>>> 2.1.x.  I don't think we ran that then, but we've been at 2.1 since last
>>> November.  Looking back, the warnings start appearing around June, when no
>>> maintenance had been performed on the cluster.  At that time, we had been
>>> on 2.1.3 for a couple of months.  We've been on 2.1.10 for the last week
>>> (the upgrade was when we noticed this warning for the first time).
>>>
>>> From a suggestion in IRC, I went ahead and ran upgradesstables on all
>>> the nodes.  Our weekly repair also ran this morning.  But the warnings
>>> still show up throughout the day.
>>>
>>> So, we have many questions:
>>>
>>>- How much should we be freaking out?
>>>- Why is this recurring?  If I understand what's happening, this is
>>>a self-healing process.  So, why would it keep happening?  Are we 
>>> possibly
>>>using counters incorrectly?
>>>- 

Unsubscribe

2015-10-20 Thread Safdar Kureishy
UNSUBSCRIBE


Re: Read query taking a long time

2015-10-20 Thread Brice Figureau
Hi,

Thanks for your answer. Unfortunately since I wrote my e-mail, things
are a bit better.

This might be because I moved from openjdk 7 to oracle jdk 8 after
having seen a warning in the C* log about openjdk, and I also added a
node (for other reasons).

Now the query itself takes only 1.5s~2s instead of the 5s~6s it was
taking before.

On Mon, 2015-10-19 at 14:38 +0100, Carlos Alonso wrote:
> Could you send cfhistograms and cfstats relevant to the read column
> family?

Here are the requested informatrion
% nodetool proxyhistograms
proxy histograms
Percentile  Read Latency Write Latency Range Latency
(micros)  (micros)  (micros)
50%  1109.00372.00   1916.00
75% 14237.00535.00   2759.00
95%105778.00642.00   4768.00
98%545791.00770.00  11864.00
99%785939.00924.00  14237.00
Min73.00  0.00373.00
Max   5839588.00  88148.00  73457.00

% nodetool cfstats akka.messages
Keyspace: akka
Read Count: 3334784
Read Latency: 9.98472696792356 ms.
Write Count: 7124
Write Latency: 0.572256457046603 ms.
Pending Flushes: 0
Table: messages
SSTable count: 1
Space used (live): 4680841
Space used (total): 4680841
Space used by snapshots (total): 23615746
Off heap memory used (total): 4051
SSTable Compression Ratio: 0.17318784636027024
Number of keys (estimate): 478
Memtable cell count: 317
Memtable data size: 42293
Memtable off heap memory used: 0
Memtable switch count: 10
Local read count: 3334784
Local read latency: 9.985 ms
Local write count: 7124
Local write latency: 0.573 ms
Pending flushes: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.0
Bloom filter space used: 592
Bloom filter off heap memory used: 584
Index summary off heap memory used: 203
Compression metadata off heap memory used: 3264
Compacted partition minimum bytes: 73
Compacted partition maximum bytes: 17436917
Compacted partition mean bytes: 63810
Average live cells per slice (last five minutes):
0.6693421039216356
Maximum live cells per slice (last five minutes): 1033.0
Average tombstones per slice (last five minutes): 0.0
Maximum tombstones per slice (last five minutes): 0.0



If I read correctly this, there's a huge read latency while proxying,
but local read latency, or even all node latency on this table is
correct.

Would that mean this is a network issue?
-- 
Brice Figureau