Re: Cassandra Client Recommendation

2013-04-17 Thread Everton Lima
Hi Techy,

We are using Astyanax with cassandra 1.2.4.

beneficits:
 * It is so easy to configure and use.
 * Good wiki
 * Mantained by Netflix
 * Solution to manage the store of big files (more than 15mb)
 * Solution to read all rows efficiently

problems:
 * It consume more memory


2013/4/16 Techy Teck comptechge...@gmail.com

 Hello,
 I have recently started working with Cassandra Database. Now I am in the
 process of evaluating which Cassandra client I should go forward with.

 I am mainly interested in these three-

 --1)  Astyanax client

 2--)  New Datastax client that uses Binary protocol.

 --3)  Pelops client


 Can anyone provide some thoughts on this? Some advantages and
 disadvantages for these three will be great start for me.


 Keeping in mind, we are running Cassandra 1.2.2 in production environment.



 Thanks for the help.




-- 
Everton Lima Aleixo
Bacharel em Ciência da Computação pela UFG
Mestrando em Ciência da Computação pela UFG
Programador no LUPA


Re: Cassandra Client Recommendation

2013-04-17 Thread Techy Teck
Thanks Everton for the suggestion. Couple of questions-

1) Does Astyanax client have any problem with previous version of Cassandra?
2) You said one problem, that it will consume more memory? Can you
elaborate that slightly? What do you mean by that?
3) Does Astyanax supports asynch capabilities?


On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.comwrote:

 Hi Techy,

 We are using Astyanax with cassandra 1.2.4.

 beneficits:
  * It is so easy to configure and use.
  * Good wiki
  * Mantained by Netflix
  * Solution to manage the store of big files (more than 15mb)
  * Solution to read all rows efficiently

 problems:
  * It consume more memory


 2013/4/16 Techy Teck comptechge...@gmail.com

  Hello,
 I have recently started working with Cassandra Database. Now I am in the
 process of evaluating which Cassandra client I should go forward with.

 I am mainly interested in these three-

 --1)  Astyanax client

 2--)  New Datastax client that uses Binary protocol.

 --3)  Pelops client


 Can anyone provide some thoughts on this? Some advantages and
 disadvantages for these three will be great start for me.


 Keeping in mind, we are running Cassandra 1.2.2 in production environment.



 Thanks for the help.




 --
 Everton Lima Aleixo
 Bacharel em Ciência da Computação pela UFG
 Mestrando em Ciência da Computação pela UFG
 Programador no LUPA




Re: Cassandra Client Recommendation

2013-04-17 Thread Everton Lima
1) Does Astyanax client have any problem with previous version of Cassandra?
We have used with 1.1.8, but for this version we do not use the last
version of Astyanax. But I think that to Cassandra 1.2.* the last version
of astyanax will work.

2) You said one problem, that it will consume more memory? Can you
elaborate that slightly? What do you mean by that?
In our tests, when we use Astyanax the process memory elavate comparing
with with using direct TBinaryProtocol (cassandra-all.jar). So it is
necessary that you have more memory to your process.

3) Does Astyanax supports asynch capabilities?
What is an asynch capabilites example?



2013/4/17 Techy Teck comptechge...@gmail.com

 Thanks Everton for the suggestion. Couple of questions-

 1) Does Astyanax client have any problem with previous version of
 Cassandra?
 2) You said one problem, that it will consume more memory? Can you
 elaborate that slightly? What do you mean by that?
 3) Does Astyanax supports asynch capabilities?


 On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.comwrote:

 Hi Techy,

 We are using Astyanax with cassandra 1.2.4.

 beneficits:
  * It is so easy to configure and use.
  * Good wiki
  * Mantained by Netflix
  * Solution to manage the store of big files (more than 15mb)
  * Solution to read all rows efficiently

 problems:
  * It consume more memory


 2013/4/16 Techy Teck comptechge...@gmail.com

  Hello,
 I have recently started working with Cassandra Database. Now I am in the
 process of evaluating which Cassandra client I should go forward with.

 I am mainly interested in these three-

 --1)  Astyanax client

 2--)  New Datastax client that uses Binary protocol.

 --3)  Pelops client


 Can anyone provide some thoughts on this? Some advantages and
 disadvantages for these three will be great start for me.


 Keeping in mind, we are running Cassandra 1.2.2 in production
 environment.



 Thanks for the help.




 --
 Everton Lima Aleixo
 Bacharel em Ciência da Computação pela UFG
 Mestrando em Ciência da Computação pela UFG
 Programador no LUPA





-- 
Everton Lima Aleixo
Bacharel em Ciência da Computação pela UFG
Mestrando em Ciência da Computação pela UFG
Programador no LUPA


RE: Cassandra Client Recommendation

2013-04-17 Thread Francisco Trujillo
Hi

We are using Cassandra 1.6 at this moment. We start to work with Hector, 
because it is the first recommendation that you can find in a simple google 
search for java clients Cassandra.

We start using Hector but when we start to have non dynamically column 
families, that can be managed using cql, we start to use astyanax because:

-  It is easy to understand the code even for people who has never 
worked with Cassandra.

-  The cql implementation offer more capabilities

-  Astyanax is prepared to use Cql 3 and with hector we experienced 
some problems (probably our fault, but with Astyanax everything works from the 
beginning).

-  Astyanax allow to use compound primary keys.

In next months we are going to substitute Hector by Astyanax totally but at 
this moment we are using both:


-  Astyanax for cql.

-  Hector for dynamic column families.


From: Techy Teck [mailto:comptechge...@gmail.com]
Sent: woensdag 17 april 2013 8:14
To: user
Subject: Re: Cassandra Client Recommendation

Thanks Everton for the suggestion. Couple of questions-

1) Does Astyanax client have any problem with previous version of Cassandra?
2) You said one problem, that it will consume more memory? Can you elaborate 
that slightly? What do you mean by that?
3) Does Astyanax supports asynch capabilities?

On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima 
peitin.inu...@gmail.commailto:peitin.inu...@gmail.com wrote:
Hi Techy,
We are using Astyanax with cassandra 1.2.4.

beneficits:
 * It is so easy to configure and use.
 * Good wiki
 * Mantained by Netflix
 * Solution to manage the store of big files (more than 15mb)
 * Solution to read all rows efficiently
problems:
 * It consume more memory

2013/4/16 Techy Teck comptechge...@gmail.commailto:comptechge...@gmail.com

Hello,
I have recently started working with Cassandra Database. Now I am in the 
process of evaluating which Cassandra client I should go forward with.

I am mainly interested in these three-

--1)  Astyanax client

2--)  New Datastax client that uses Binary protocol.

--3)  Pelops client



Can anyone provide some thoughts on this? Some advantages and disadvantages for 
these three will be great start for me.


Keeping in mind, we are running Cassandra 1.2.2 in production environment.



Thanks for the help.


--
Everton Lima Aleixo
Bacharel em Ciência da Computação pela UFG
Mestrando em Ciência da Computação pela UFG
Programador no LUPA




Re: Reduce Cassandra GC

2013-04-17 Thread Joel Samuelsson
You're right, it's probably hard. I should have provided more data.

I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the
log indicates that JNA is working, please correct me if I'm wrong:
CLibrary.java (line 111) JNA mlockall successful

Total amount of RAM is 4GB.

My description of data size was very bad. Sorry about that. Data set size
is 12.3 GB per node, compressed.

Heap size is 998.44MB according to nodetool info.
Key cache is 49MB bytes according to nodetool info.
Row cache size is 0 bytes acoording to nodetool info.
Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max
in jconsole.
Memtable is left at default which should give it 333MB according to
documentation (uncertain where I can verify this).

Our production cluster seems similar to your dev cluster so possibly
increasing the heap to 2GB might help our issues.

I am still interested in getting rough estimates of how much heap will be
needed as data grows. Other than empirical studies how would I go about
getting such estimates?


2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com

  How one could provide any help without any knowledge about your cluster,
 node and environment settings?

 ** **

 40GB was calculated from 2 nodes with RF=2 (each has 100% data range),
 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any
 overhead (sstable, bloom filters and indexes).

 ** **

 With ParNew GC time such as yours even if it is a swapping issue I could
 say only that heap size is too small.

 ** **

 Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is
 JNA installed and used? What is total amount of RAM?

 ** **

 Just for a DEV environment we use 3 virtual machines with 4GB RAM and use
 2GB heap without any GC issue with amount of data from 0 to 16GB compressed
 on each node. Memtable space sized to 100MB, New Heap 400MB.

 ** **
Best regards / Pagarbiai
 *Viktor Jevdokimov*
 Senior Developer

 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
  [image: Adform News] http://www.adform.com
 [image: Adform awarded the Best Employer 2012]
 http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/

 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies.

   *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
 *Sent:* Tuesday, April 16, 2013 12:52
 *To:* user@cassandra.apache.org
 *Subject:* Re: Reduce Cassandra GC

 ** **

 How do you calculate the heap / data size ratio? Is this a linear ratio?**
 **

 ** **

 Each node has slightly more than 12 GB right now though.

 ** **

 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com

 For a 40GB of data 1GB of heap is too low.

  

 Best regards / Pagarbiai

 *Viktor Jevdokimov*

 Senior Developer

 ** **

 Email: viktor.jevdoki...@adform.com

 Phone: +370 5 212 3063, Fax +370 5 261 0453

 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania

 Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider
 

 Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia
 

 [image: Adform News] http://www.adform.com

 [image: Adform awarded the Best Employer 
 2012]http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/
 


 Disclaimer: The information contained in this message and attachments is
 intended solely for the attention and use of the named addressee and may be
 confidential. If you are not the intended recipient, you are reminded that
 the information remains the property of the sender. You must not use,
 disclose, distribute, copy, print or rely on this e-mail. If you have
 received this message in error, please contact the sender immediately and
 irrevocably delete this message and any copies. 

 ** **

 *From:* Joel Samuelsson [mailto:samuelsson.j...@gmail.com]
 *Sent:* Tuesday, April 16, 2013 10:47
 *To:* user@cassandra.apache.org
 *Subject:* Reduce Cassandra GC

  

 Hi,

  

 We have a small production cluster with two nodes. The load on the nodes
 is very small, around 20 reads / sec and about the same for writes. There
 are around 2.5 million keys in the cluster and a RF of 2.

  

 About 2.4 million of the rows are skinny (6 columns) and around 3kb 

Key-Token mapping in cassandra

2013-04-17 Thread Ravikumar Govindarajan
We would like to map multiple keys to a single token in cassandra. I
believe this should be possible now with CASSANDRA-1034

Ex:

Key1 -- 123/IMAGE
Key2 -- 123/DOCUMENTS
Key3 -- 123/MULTIMEDIA

I would like all keys with 123 as prefix to be mapped to a single token.

Is this possible? What should be the Partitioner that I should most likely
extend and write my own to achieve the desired result?

--
Ravi


InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Andre Tavares
Hi,

I am getting an exception when I run Hadoop with Cassandra that follows:

WARN org.apache.hadoop.mapred.Child (main): Error running child
java.lang.RuntimeException: InvalidRequestException(why:Start key's token
sorts after end token)
at
org.apache.cassandra.hadoop.ColumnFamilyRecordReader$WideRowIterator.maybeInit(ColumnFamilyRecordReader.java:453)

I don't know what exactly this message means and how to solve the problem
... I am using Priam for manager my cluster in Cassandra over Elastic
Map/Reduce on Amazon ...

Any hint helps ...

Thanks,

Andre


Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Hiller, Dean
I literally jut replied to your stackoverflow comment then saw this email.  I 
need the whole stack trace.  My guess is the ColFamily is configured for one 
sort method where map/reduce is using another or something when querying but 
that's just a guess.

Dean

From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 17, 2013 6:47 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: InvalidRequestException: Start key's token sorts after end token

know what exactly this message means a


Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Andre Tavares
Dean,

sorry,  but I saw your comments on Stackoverflow (
http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr)
just after I sent this message ...

and I think you may be right about the sort method,  but Priam sets
 Cassandra partitioner with RandomPartitioner, and maybe the correct
could be Murmur3Partitioner when we use Hadoop (I am not sure too) ... if
that is true I got a problem because I can't change the partitioner with
Priam (I think it only works with RandomPartitioner) ...

Andre

2013/4/17 Hiller, Dean dean.hil...@nrel.gov

 I literally jut replied to your stackoverflow comment then saw this email.
  I need the whole stack trace.  My guess is the ColFamily is configured for
 one sort method where map/reduce is using another or something when
 querying but that's just a guess.

 Dean

 From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 17, 2013 6:47 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: InvalidRequestException: Start key's token sorts after end token

 know what exactly this message means a



Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread himanshu.joshi

Hi,


When I am trying to insert the data into a table using Java with JDBC, I 
am getting the error


InvalidRequestException(why:cannot parse 'Jo' as hex bytes)

My insert quarry is:
insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);

This insert quarry is running successfully from CQLSH command prompt but 
not from the code


The quarry I have used to create the table in CQLSH is:

CREATE TABLE temp (
  id bigint PRIMARY KEY,
  dt_stamp timestamp,
  name text,
  url_id bigint,
  value text
) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};



I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

Is there any way to define these attributes using CQLSH ?
I have already tried ASSUME command but it also have not resolved the 
problem.


I am a beginner in cassandra and need your guidance.

--
Thanks  Regards,
Himanshu Joshi



Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread Hiller, Dean
What's the stack trace you see?  At the time, I was thinking column scan not 
row scan as perhaps your code or priam's code was doing a column slice within a 
row set and the columns are sorted by Integer while priam is passing in UTF8 or 
vice-versa.  Ie. Do we know if this is a column sorting issue or a row one?

Dean

From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com
Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 17, 2013 7:09 AM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Re: InvalidRequestException: Start key's token sorts after end token

Dean,

sorry,  but I saw your comments on Stackoverflow 
(http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr
 ) just after I sent this message ...

and I think you may be right about the sort method,  but Priam sets  Cassandra 
partitioner with RandomPartitioner, and maybe the correct could be 
Murmur3Partitioner when we use Hadoop (I am not sure too) ... if that is true 
I got a problem because I can't change the partitioner with Priam (I think it 
only works with RandomPartitioner) ...

Andre

2013/4/17 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
I literally jut replied to your stackoverflow comment then saw this email.  I 
need the whole stack trace.  My guess is the ColFamily is configured for one 
sort method where map/reduce is using another or something when querying but 
that's just a guess.

Dean

From: Andre Tavares 
andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.com
Reply-To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Date: Wednesday, April 17, 2013 6:47 AM
To: 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 
user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: InvalidRequestException: Start key's token sorts after end token

know what exactly this message means a



looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right

2013-04-17 Thread Hiller, Dean
Is cassandra-thrift-1.1.1.jar the generated code?  I see a send() and recv() 
but I don't see a send(Callback cb) that is typicaly of true asynchronous 
platforms.  Ie. I don't know when to call recv myself obviously if I am trying 
to make astyanax truly asynchronous.

The reason I ask is we have a 100k row upload that with synchronous 20 threads 
takes around 30 seconds and with simulation, we predict this would be done in 3 
seconds with an asynch api as our threads would not get held up like they do 
now.  I guess we can try to crank it up to 100 threads to get it running a bit 
faster for now :( :(.

Thanks,
Dean


Re: How to stop Cassandra and then restart it in windows?

2013-04-17 Thread Raihan Jamal
Hello,

Can anyone provide any help on this?

Thanks in advance.






*Raihan Jamal*


On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal jamalrai...@gmail.com wrote:

 Hello,

 I installed single node cluster in my local dev box which is running
 Windows 7 and it was working fine. Due to some reason, I need to restart my
 desktop and then after that whenever I am doing like this on the command
 prompt, it always gives me the below exception-

 S:\Apache Cassandra\apache-cassandra-1.2.3\bincassandra -f
 Starting Cassandra Server
 Error: Exception thrown by the agent : java.rmi.server.ExportException:
 Port already in use: 7199; nested exception is:
 java.net.BindException: Address already in use: JVM_Bind


 Meaning port being used somewhere. I have made some changes in *cassandra.yaml
 *file so I need to shutdown the Cassandra server and then restart it
 again.

 Can anybody help me with this?

 Thanks for the help.





Re: Added extra column as composite key while creation counter column family

2013-04-17 Thread Robert Coli
On Tue, Apr 16, 2013 at 10:29 PM, Kuldeep Mishra
kuld.cs.mis...@gmail.comwrote:

 cassandra 1.2.0

 Is it a bug in  1.2.0 ?


While I can't speak to this specific issue, 1.2.0 has meaningful known
issues. I suggest upgrade to 1.2.3(/4) ASAP.

=Rob


Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
That was our first thought.  Using maven's dependency tree info we verified
that we're using the expected (cass 1.2.3) jars

$ mvn dependency:tree | grep thrift
[INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
[INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile

I've also dumped the final command run by the hadoop we use (CDH3u5) and
verified it's not sneaking thrift in on us.


On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.comwrote:

 Can you confirm the you are using the same thrift version that ships 1.2.3
 ?

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:

 A bump to say I found this


 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded

 so others are seeing similar behavior.

 From what I can see of org.apache.cassandra.hadoop nothing has changed
 since 1.1.5 when we didn't see such things but sure looks like there's a
 bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow
 down to a dataset and code that can reproduce.

 On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:

 We are using Astyanax in production but I cut back to just Hadoop and
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.

 We do have some extremely large rows but we went from everything working
 with 1.1.5 to almost everything carping with 1.2.3.  Something has changed.
  Perhaps we were doing something wrong earlier that 1.2.3 exposed but
 surprises are never welcome in production.

 On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:

 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some
 checks added in Thrift or Hector?

 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com]
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded

 Hello,

 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran
 sstableupgrades and got the ring on its feet and we are now seeing a new
 issue.

 When we run MapReduce jobs against practically any table we find the
 following errors:

 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader:
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using
 ResourceCalculatorPlugin :
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running
 child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length
 exceeded: 106
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
 at
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
 at
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
 at
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
 at
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
 at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
 at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
 at org.apache.hadoop.mapred.Child.main(Child.java:260)
 Caused by: org.apache.thrift.TException: Message length exceeded: 106
 at
 org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393)
 at
 org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
 at org.apache.cassandra.thrift.Column.read(Column.java:528)
 at
 

Re: MySQL Cluster performing faster than Cassandra cluster on single table

2013-04-17 Thread aaron morton
How many threads / processes do you have performing the writes? 
How big are the mutations ? 
Where are you measuring the latency ? 

Look at the nodetool cfhistograms to see the time it takes for a single node to 
perform a write. 
Look at the nodetool proxyhistograms to see the end to end request latency from 
the coordinator. 
^ the number on the left is microseconds for both. 

Generally cassandra does well with more clients. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 2:56 PM, Jabbar Azam aja...@gmail.com wrote:

 MySQL cluster also has the index in ram.  So with lots of rows the ram 
 becomes a limiting factor.
 
 That's what my colleague found and hence why were sticking with Cassandra.
 
 On 16 Apr 2013 21:05, horschi hors...@gmail.com wrote:
 
 
 Ah, I see, that makes sense. Have you got a source for the storing of 
 hundreds of gigabytes? And does Cassandra not store anything in memory?
 It stores bloom filters and index-samples in memory. But they are much 
 smaller than the actual data and they can be configured.
  
 
 Yeah, my dataset is small at the moment - perhaps I should have chosen 
 something larger for the work I'm doing (University dissertation), however, 
 it is far too late to change now!
 On paper mysql-cluster looks great. But in daily use its not as nice as 
 Cassandra (where you have machines dying, networks splitting, etc.).
 
 cheers,
 Christian



Re: differences between DataStax Community Edition and Cassandra package

2013-04-17 Thread aaron morton
It's the same as the Apache version, but DSC comes with samples and the free 
version of Ops Centre. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 6:36 PM, Francisco Trujillo f.truji...@genetwister.nl wrote:

 Hi everyone
  
 Probably this question has been formulated for someone in the past. We are 
 using apache Cassandra 1.6 now and we are planning to update the version. 
 Datastax provides their own Cassandra package called “Datastax Community 
 Edition”. I know that the Datastax package have some tools to manage the 
 cluster like visual interfaces, but
  
 is there some important difference in the database itself is we compared with 
 the same apache Cassandra that we can download from 
 http://cassandra.apache.org/?
  
 Thanks for your help in advanced



Re: Cassandra Client Recommendation

2013-04-17 Thread aaron morton
One node on the native binary protocol, AFAIK it's still considered beta in 1.2

Also +1 for Astyanax

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl wrote:

 Hi
  
 We are using Cassandra 1.6 at this moment. We start to work with Hector, 
 because it is the first recommendation that you can find in a simple google 
 search for java clients Cassandra.
  
 We start using Hector but when we start to have non dynamically column 
 families, that can be managed using cql, we start to use astyanax because:
 -  It is easy to understand the code even for people who has never 
 worked with Cassandra.
 -  The cql implementation offer more capabilities
 -  Astyanax is prepared to use Cql 3 and with hector we experienced 
 some problems (probably our fault, but with Astyanax everything works from 
 the beginning).
 -  Astyanax allow to use compound primary keys.
  
 In next months we are going to substitute Hector by Astyanax totally but at 
 this moment we are using both:
  
 -  Astyanax for cql.
 -  Hector for dynamic column families.
  
  
 From: Techy Teck [mailto:comptechge...@gmail.com] 
 Sent: woensdag 17 april 2013 8:14
 To: user
 Subject: Re: Cassandra Client Recommendation
  
 Thanks Everton for the suggestion. Couple of questions-
  
 1) Does Astyanax client have any problem with previous version of Cassandra?
 2) You said one problem, that it will consume more memory? Can you elaborate 
 that slightly? What do you mean by that?
 3) Does Astyanax supports asynch capabilities?
  
 
 On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com 
 wrote:
 Hi Techy,
 
 We are using Astyanax with cassandra 1.2.4. 
 
 beneficits:
  * It is so easy to configure and use.
  * Good wiki
  * Mantained by Netflix
  * Solution to manage the store of big files (more than 15mb)
  * Solution to read all rows efficiently
 
 problems:
  * It consume more memory
  
 
 2013/4/16 Techy Teck comptechge...@gmail.com
 Hello,
 
 I have recently started working with Cassandra Database. Now I am in the 
 process of evaluating which Cassandra client I should go forward with.
 I am mainly interested in these three-
 
 --1)  Astyanax client
 
 2--)  New Datastax client that uses Binary protocol.
 
 --3)  Pelops client
 
  
 
 Can anyone provide some thoughts on this? Some advantages and disadvantages 
 for these three will be great start for me.
 
  
 
 Keeping in mind, we are running Cassandra 1.2.2 in production environment.
  
 
 Thanks for the help.
 
 
 
 -- 
 Everton Lima Aleixo
 Bacharel em Ciência da Computação pela UFG
 Mestrando em Ciência da Computação pela UFG
 Programador no LUPA
  



Re: Reduce Cassandra GC

2013-04-17 Thread aaron morton
 INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122) 
 GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is 1046937600
This does not say that the heap is full. 
ParNew is GC activity for the new heap, which is typically a smaller part of 
the overall heap. 

It sounds like you are running with defaults for the memory config, which is 
generally a good idea. But 4GB total memory for a node is on the small size.

Try some changes, edit the cassandra-env.sh file and change

MAX_HEAP_SIZE=2G
HEAP_NEWSIZE=400M

You may also want to try:

MAX_HEAP_SIZE=2G
HEAP_NEWSIZE=800M
JVM_OPTS=$JVM_OPTS -XX:SurvivorRatio=4 
JVM_OPTS=$JVM_OPTS -XX:MaxTenuringThreshold=2

The size of the new heap generally depends on the number of cores available, 
see the commends in the -env file. 

An older discussion about memory use, not that in 1.2 the bloom filters (and 
compression data) are off heap now.
http://www.mail-archive.com/user@cassandra.apache.org/msg25762.html  

Hope that helps. 

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:06 PM, Joel Samuelsson samuelsson.j...@gmail.com wrote:

 You're right, it's probably hard. I should have provided more data.
 
 I'm running Ubuntu 10.04 LTS with JNA installed. I believe this line in the 
 log indicates that JNA is working, please correct me if I'm wrong:
 CLibrary.java (line 111) JNA mlockall successful
 
 Total amount of RAM is 4GB.
 
 My description of data size was very bad. Sorry about that. Data set size is 
 12.3 GB per node, compressed.
 
 Heap size is 998.44MB according to nodetool info. 
 Key cache is 49MB bytes according to nodetool info.
 Row cache size is 0 bytes acoording to nodetool info. 
 Max new heap is 205MB kbytes according to Memory Pool Par Eden Space max in 
 jconsole.
 Memtable is left at default which should give it 333MB according to 
 documentation (uncertain where I can verify this).
 
 Our production cluster seems similar to your dev cluster so possibly 
 increasing the heap to 2GB might help our issues.
 
 I am still interested in getting rough estimates of how much heap will be 
 needed as data grows. Other than empirical studies how would I go about 
 getting such estimates?
 
 
 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com
 How one could provide any help without any knowledge about your cluster, node 
 and environment settings?
 
  
 
 40GB was calculated from 2 nodes with RF=2 (each has 100% data range), 
 2.4-2.5M rows * 6 cols * 3kB as a minimum without compression and any 
 overhead (sstable, bloom filters and indexes).
 
  
 
 With ParNew GC time such as yours even if it is a swapping issue I could say 
 only that heap size is too small.
 
  
 
 Check Heap, New Heap sizes, memtable and cache sizes. Are you on Linux? Is 
 JNA installed and used? What is total amount of RAM?
 
  
 
 Just for a DEV environment we use 3 virtual machines with 4GB RAM and use 2GB 
 heap without any GC issue with amount of data from 0 to 16GB compressed on 
 each node. Memtable space sized to 100MB, New Heap 400MB.
 
  
 
 Best regards / Pagarbiai
 Viktor Jevdokimov
 Senior Developer
 
 Email: viktor.jevdoki...@adform.com
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 Follow us on Twitter: @adforminsider
 Take a ride with Adform's Rich Media Suite
 signature-logo29.png
 signature-best-employer-logo4823.png 
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information remains the property of the sender. You must not use, 
 disclose, distribute, copy, print or rely on this e-mail. If you have 
 received this message in error, please contact the sender immediately and 
 irrevocably delete this message and any copies.
 
 From: Joel Samuelsson [mailto:samuelsson.j...@gmail.com] 
 Sent: Tuesday, April 16, 2013 12:52
 To: user@cassandra.apache.org
 Subject: Re: Reduce Cassandra GC
 
  
 
 How do you calculate the heap / data size ratio? Is this a linear ratio?
 
  
 
 Each node has slightly more than 12 GB right now though.
 
  
 
 2013/4/16 Viktor Jevdokimov viktor.jevdoki...@adform.com
 
 For a 40GB of data 1GB of heap is too low.
 
  
 
 Best regards / Pagarbiai
 
 Viktor Jevdokimov
 
 Senior Developer
 
  
 
 Email: viktor.jevdoki...@adform.com
 
 Phone: +370 5 212 3063, Fax +370 5 261 0453
 
 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
 
 Follow us on Twitter: @adforminsider
 
 Take a ride with Adform's Rich Media Suite
 
 image001.png
 
 image002.png
 
 
 Disclaimer: The information contained in this message and attachments is 
 intended solely for the attention and use of the named addressee and may be 
 confidential. If you are not the intended recipient, you are reminded that 
 the information 

Re: Key-Token mapping in cassandra

2013-04-17 Thread aaron morton
 CASSANDRA-1034
That ticket is about removing an assumption which was not correct. 

 I would like all keys with 123 as prefix to be mapped to a single token.
Why? 
it's not possible nor desirable IMHO. Tokens are used to identify a single row 
internally. 
 
Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/04/2013, at 11:25 PM, Ravikumar Govindarajan 
ravikumar.govindara...@gmail.com wrote:

 We would like to map multiple keys to a single token in cassandra. I believe 
 this should be possible now with CASSANDRA-1034
 
 Ex:
 
 Key1 -- 123/IMAGE
 Key2 -- 123/DOCUMENTS
 Key3 -- 123/MULTIMEDIA
 
 I would like all keys with 123 as prefix to be mapped to a single token.
 
 Is this possible? What should be the Partitioner that I should most likely 
 extend and write my own to achieve the desired result?
 
 --
 Ravi



Re: Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread aaron morton
What version are you using ?
And what JDBC driver ? 

Sounds like the driver is not converting the value to bytes for you. 
 
 I guess the problem may because of undefined 
 key_validation_class,default_validation_class and comparator etc.
If you are using CQL these are not relevant. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:31 AM, himanshu.joshi himanshu.jo...@orkash.com wrote:

 Hi,
 
 
 When I am trying to insert the data into a table using Java with JDBC, I am 
 getting the error
 
 InvalidRequestException(why:cannot parse 'Jo' as hex bytes)
 
 My insert quarry is:
 insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);
 
 This insert quarry is running successfully from CQLSH command prompt but not 
 from the code
 
 The quarry I have used to create the table in CQLSH is:
 
 CREATE TABLE temp (
  id bigint PRIMARY KEY,
  dt_stamp timestamp,
  name text,
  url_id bigint,
  value text
 ) WITH
  bloom_filter_fp_chance=0.01 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=864000 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'SizeTieredCompactionStrategy'} AND
  compression={'sstable_compression': 'SnappyCompressor'};
 
 
 
 I guess the problem may because of undefined 
 key_validation_class,default_validation_class and comparator etc.
 Is there any way to define these attributes using CQLSH ?
 I have already tried ASSUME command but it also have not resolved the problem.
 
 I am a beginner in cassandra and need your guidance.
 
 -- 
 Thanks  Regards,
 Himanshu Joshi
 



Multi datacenter setup question

2013-04-17 Thread More, Sandeep R
Hello,
My test setup consist of two datacenters DC1 and DC2.
DC2 has a offset of 10 as you can see in the following ring command.

I have two questions:

1)  Let's say in this case I insert a key at DC2 and its token is, let's 
say 85070591730234615865843651857942052874, in this case will it be owned by 
DC2 ? and then replicated on DC1 ? i.e. who owns it.

2)  Notice that the Owns distribution is not even, is this something I 
should be worrying about ?

I am using Cassandra 1.0.12.

Following is the ring command output:

Address DC  RackStatus State   LoadOwns
Token


85070591730234615865843651857942052874
10.0.0.1   DC1 RAC-1   Up Normal  101.73 KB   50.00%  0
10.0.0.2   DC2 RAC-1   Up Normal  92.55 KB0.00%   10
10.0.0.3   DC1 RAC-1   Up Normal  115.09 KB   50.00%  
85070591730234615865843651857942052864
10.0.0.4   DC2 RAC-1   Up Normal  101.62 KB   0.00%   
85070591730234615865843651857942052874




Using an EC2 cluster from the outside.

2013-04-17 Thread maillists0
I have a working 3 node cluster in a single ec2 region and I need to hit it
from our datacenter. As you'd expect, the client gets the internal
addresses of the nodes back.

Someone on irc mentioned using the public IP for rpc and binding that
address to the box. I see that mentioned in an old list mail but I don't
get exactly how this is supposed to work. I could really use either a link
to something with explicit directions or a detailed explanation.

Should cassandra use the public IPs for everything -- listen, b'cast, and
rpc? What should cassandra.yaml look like? Is the idea to use the public
addresses for cassandra but route the requests between nodes over the lan
using nat?

Any help or suggestion is appreciated.


Re: Cassandra Client Recommendation

2013-04-17 Thread Techy Teck
Thanks Aaron for the suggestion. I am not sure, I was able to understand
regarding one node thing you mentioned on the native binary protocol? Can
you please elaborate that?



On Wed, Apr 17, 2013 at 11:21 AM, aaron morton aa...@thelastpickle.comwrote:

 One node on the native binary protocol, AFAIK it's still considered beta
 in 1.2

 Also +1 for Astyanax

 Cheers

 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand

 @aaronmorton
 http://www.thelastpickle.com

 On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl
 wrote:

  Hi
 
  We are using Cassandra 1.6 at this moment. We start to work with Hector,
 because it is the first recommendation that you can find in a simple google
 search for java clients Cassandra.
 
  We start using Hector but when we start to have non dynamically column
 families, that can be managed using cql, we start to use astyanax because:
  -  It is easy to understand the code even for people who has
 never worked with Cassandra.
  -  The cql implementation offer more capabilities
  -  Astyanax is prepared to use Cql 3 and with hector we
 experienced some problems (probably our fault, but with Astyanax everything
 works from the beginning).
  -  Astyanax allow to use compound primary keys.
 
  In next months we are going to substitute Hector by Astyanax totally but
 at this moment we are using both:
 
  -  Astyanax for cql.
  -  Hector for dynamic column families.
 
 
  From: Techy Teck [mailto:comptechge...@gmail.com]
  Sent: woensdag 17 april 2013 8:14
  To: user
  Subject: Re: Cassandra Client Recommendation
 
  Thanks Everton for the suggestion. Couple of questions-
 
  1) Does Astyanax client have any problem with previous version of
 Cassandra?
  2) You said one problem, that it will consume more memory? Can you
 elaborate that slightly? What do you mean by that?
  3) Does Astyanax supports asynch capabilities?
 
 
  On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com
 wrote:
  Hi Techy,
 
  We are using Astyanax with cassandra 1.2.4.
 
  beneficits:
   * It is so easy to configure and use.
   * Good wiki
   * Mantained by Netflix
   * Solution to manage the store of big files (more than 15mb)
   * Solution to read all rows efficiently
 
  problems:
   * It consume more memory
 
 
  2013/4/16 Techy Teck comptechge...@gmail.com
  Hello,
 
  I have recently started working with Cassandra Database. Now I am in the
 process of evaluating which Cassandra client I should go forward with.
  I am mainly interested in these three-
 
  --1)  Astyanax client
 
  2--)  New Datastax client that uses Binary protocol.
 
  --3)  Pelops client
 
 
 
  Can anyone provide some thoughts on this? Some advantages and
 disadvantages for these three will be great start for me.
 
 
 
  Keeping in mind, we are running Cassandra 1.2.2 in production
 environment.
 
 
  Thanks for the help.
 
 
 
  --
  Everton Lima Aleixo
  Bacharel em Ciência da Computação pela UFG
  Mestrando em Ciência da Computação pela UFG
  Programador no LUPA
 




Re: InvalidRequestException: Start key's token sorts after end token

2013-04-17 Thread aaron morton
If you Hadoop task supplying both a start and finish key for the slice ? You 
probably only want the start. 

Provide the full call stack and the code in your hadoop task. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:34 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 What's the stack trace you see?  At the time, I was thinking column scan not 
 row scan as perhaps your code or priam's code was doing a column slice within 
 a row set and the columns are sorted by Integer while priam is passing in 
 UTF8 or vice-versa.  Ie. Do we know if this is a column sorting issue or a 
 row one?
 
 Dean
 
 From: Andre Tavares andre...@gmail.commailto:andre...@gmail.com
 Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 17, 2013 7:09 AM
 To: user@cassandra.apache.orgmailto:user@cassandra.apache.org 
 user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: Re: InvalidRequestException: Start key's token sorts after end token
 
 Dean,
 
 sorry,  but I saw your comments on Stackoverflow 
 (http://stackoverflow.com/questions/16041727/operationtimeoutexception-cassandra-cluster-aws-emr
  ) just after I sent this message ...
 
 and I think you may be right about the sort method,  but Priam sets  
 Cassandra partitioner with RandomPartitioner, and maybe the correct could 
 be Murmur3Partitioner when we use Hadoop (I am not sure too) ... if that is 
 true I got a problem because I can't change the partitioner with Priam (I 
 think it only works with RandomPartitioner) ...
 
 Andre
 
 2013/4/17 Hiller, Dean dean.hil...@nrel.govmailto:dean.hil...@nrel.gov
 I literally jut replied to your stackoverflow comment then saw this email.  I 
 need the whole stack trace.  My guess is the ColFamily is configured for one 
 sort method where map/reduce is using another or something when querying but 
 that's just a guess.
 
 Dean
 
 From: Andre Tavares 
 andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.commailto:andre...@gmail.com
 Reply-To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
  
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Date: Wednesday, April 17, 2013 6:47 AM
 To: 
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
  
 user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.orgmailto:user@cassandra.apache.org
 Subject: InvalidRequestException: Start key's token sorts after end token
 
 know what exactly this message means a
 



Re: looking at making astyanax asynchronous but cassandra-thrift-1.1.1 doesn't look right

2013-04-17 Thread aaron morton
Here's an example I did in python a long time ago 
http://www.mail-archive.com/user@cassandra.apache.org/msg04775.html

Call send() then select on the file handle, when it's ready to read call 
recv(). 

Or just add more threads on your side :)

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 2:50 AM, Hiller, Dean dean.hil...@nrel.gov wrote:

 Is cassandra-thrift-1.1.1.jar the generated code?  I see a send() and recv() 
 but I don't see a send(Callback cb) that is typicaly of true asynchronous 
 platforms.  Ie. I don't know when to call recv myself obviously if I am 
 trying to make astyanax truly asynchronous.
 
 The reason I ask is we have a 100k row upload that with synchronous 20 
 threads takes around 30 seconds and with simulation, we predict this would be 
 done in 3 seconds with an asynch api as our threads would not get held up 
 like they do now.  I guess we can try to crank it up to 100 threads to get it 
 running a bit faster for now :( :(.
 
 Thanks,
 Dean



Re: differences between DataStax Community Edition and Cassandra package

2013-04-17 Thread Robert Coli
On Wed, Apr 17, 2013 at 11:19 AM, aaron morton aa...@thelastpickle.comwrote:

 It's the same as the Apache version, but DSC comes with samples and the
 free version of Ops Centre.


DSE also comes with Solr special sauce and CDFS.

=Rob


Re: How to stop Cassandra and then restart it in windows?

2013-04-17 Thread aaron morton
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is:
 java.net.BindException: Address already in use: JVM_Bind
The process is already running, is it installed as a service and was it 
automatically started when the system started ?

either shut it down using the service management or find the process (however 
you do that in windows) and kill it. 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 4:26 AM, Raihan Jamal jamalrai...@gmail.com wrote:

 Hello,
 
 Can anyone provide any help on this?
 
 Thanks in advance.
 
 
 
 
 
 
 Raihan Jamal
 
 
 On Tue, Apr 16, 2013 at 6:50 PM, Raihan Jamal jamalrai...@gmail.com wrote:
 Hello,
 
 I installed single node cluster in my local dev box which is running Windows 
 7 and it was working fine. Due to some reason, I need to restart my desktop 
 and then after that whenever I am doing like this on the command prompt, it 
 always gives me the below exception-
 
 S:\Apache Cassandra\apache-cassandra-1.2.3\bincassandra -f
 Starting Cassandra Server
 Error: Exception thrown by the agent : java.rmi.server.ExportException: Port 
 already in use: 7199; nested exception is:
 java.net.BindException: Address already in use: JVM_Bind
 
 
 Meaning port being used somewhere. I have made some changes in cassandra.yaml 
 file so I need to shutdown the Cassandra server and then restart it again.
 
 Can anybody help me with this?
 
 Thanks for the help.
 
 
 



Re: Using an EC2 cluster from the outside.

2013-04-17 Thread Robert Coli
On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote:

 I have a working 3 node cluster in a single ec2 region and I need to hit
 it from our datacenter. As you'd expect, the client gets the internal
 addresses of the nodes back.

 Someone on irc mentioned using the public IP for rpc and binding that
 address to the box. I see that mentioned in an old list mail but I don't
 get exactly how this is supposed to work. I could really use either a link
 to something with explicit directions or a detailed explanation.

 Should cassandra use the public IPs for everything -- listen, b'cast, and
 rpc? What should cassandra.yaml look like? Is the idea to use the public
 addresses for cassandra but route the requests between nodes over the lan
 using nat?

 Any help or suggestion is appreciated.


Google EC2MultiRegionSnitch.

=Rob


Re: Thrift message length exceeded

2013-04-17 Thread aaron morton
Can you reproduce this in a simple way ? 

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:

 That was our first thought.  Using maven's dependency tree info we verified 
 that we're using the expected (cass 1.2.3) jars
 
 $ mvn dependency:tree | grep thrift
 [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
 [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
 I've also dumped the final command run by the hadoop we use (CDH3u5) and 
 verified it's not sneaking thrift in on us.
 
 
 On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com wrote:
 Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
 A bump to say I found this
 
  
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
 so others are seeing similar behavior.
 
 From what I can see of org.apache.cassandra.hadoop nothing has changed since 
 1.1.5 when we didn't see such things but sure looks like there's a bug 
 that's slipped in (or been uncovered) somewhere.  I'll try to narrow down to 
 a dataset and code that can reproduce.
 
 On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:
 
 We are using Astyanax in production but I cut back to just Hadoop and 
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
 We do have some extremely large rows but we went from everything working 
 with 1.1.5 to almost everything carping with 1.2.3.  Something has changed. 
  Perhaps we were doing something wrong earlier that 1.2.3 exposed but 
 surprises are never welcome in production.
 
 On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:
 
 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some 
 checks added in Thrift or Hector?
 
 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com] 
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded
 
 Hello,
 
 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
 sstableupgrades and got the ring on its feet and we are now seeing a new 
 issue.
 
 When we run MapReduce jobs against practically any table we find the 
 following errors:
 
 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: 
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error running 
 child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length 
 exceeded: 106
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1278)
at 

Re: Multi datacenter setup question

2013-04-17 Thread aaron morton
 1)  Let’s say in this case I insert a key at DC2 and its token is, let’s 
 say 85070591730234615865843651857942052874, in this case will it be owned by 
 DC2 ? and then replicated on DC1 ? i.e. who owns it.
We don't think in terms of owning the token. 
The token range in the local DC that contains the token is used to find the 
first replica for the row. The same process is used to find the replicas in the 
remote DC's. 

 2)  Notice that the Owns distribution is not even, is this something I 
 should be worrying about ?
No. I think that's changed in the newer versions. 
 
 I am using Cassandra 1.0.12.

Please use version 1.1 or 1.2. 

Cheers


-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 7:03 AM, More, Sandeep R sandeep.r.m...@intel.com wrote:

 Hello,
 My test setup consist of two datacenters DC1 and DC2.
 DC2 has a offset of 10 as you can see in the following ring command.
  
 I have two questions:
 1)  Let’s say in this case I insert a key at DC2 and its token is, let’s 
 say 85070591730234615865843651857942052874, in this case will it be owned by 
 DC2 ? and then replicated on DC1 ? i.e. who owns it.
 2)  Notice that the Owns distribution is not even, is this something I 
 should be worrying about ?
  
 I am using Cassandra 1.0.12.
  
 Following is the ring command output:
  
 Address DC  RackStatus State   LoadOwns   
  Token
   
   
 85070591730234615865843651857942052874
 10.0.0.1   DC1 RAC-1   Up Normal  101.73 KB   50.00%  0
 10.0.0.2   DC2 RAC-1   Up Normal  92.55 KB0.00%   10
 10.0.0.3   DC1 RAC-1   Up Normal  115.09 KB   50.00%  
 85070591730234615865843651857942052864
 10.0.0.4   DC2 RAC-1   Up Normal  101.62 KB   0.00%   
 85070591730234615865843651857942052874
  
  



Re: Cassandra Client Recommendation

2013-04-17 Thread aaron morton
Was a typo, should have been One note on

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 7:23 AM, Techy Teck comptechge...@gmail.com wrote:

 Thanks Aaron for the suggestion. I am not sure, I was able to understand 
 regarding one node thing you mentioned on the native binary protocol? Can you 
 please elaborate that?
 
 
 
 On Wed, Apr 17, 2013 at 11:21 AM, aaron morton aa...@thelastpickle.com 
 wrote:
 One node on the native binary protocol, AFAIK it's still considered beta in 
 1.2
 
 Also +1 for Astyanax
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 17/04/2013, at 6:50 PM, Francisco Trujillo f.truji...@genetwister.nl 
 wrote:
 
  Hi
 
  We are using Cassandra 1.6 at this moment. We start to work with Hector, 
  because it is the first recommendation that you can find in a simple google 
  search for java clients Cassandra.
 
  We start using Hector but when we start to have non dynamically column 
  families, that can be managed using cql, we start to use astyanax because:
  -  It is easy to understand the code even for people who has never 
  worked with Cassandra.
  -  The cql implementation offer more capabilities
  -  Astyanax is prepared to use Cql 3 and with hector we experienced 
  some problems (probably our fault, but with Astyanax everything works from 
  the beginning).
  -  Astyanax allow to use compound primary keys.
 
  In next months we are going to substitute Hector by Astyanax totally but at 
  this moment we are using both:
 
  -  Astyanax for cql.
  -  Hector for dynamic column families.
 
 
  From: Techy Teck [mailto:comptechge...@gmail.com]
  Sent: woensdag 17 april 2013 8:14
  To: user
  Subject: Re: Cassandra Client Recommendation
 
  Thanks Everton for the suggestion. Couple of questions-
 
  1) Does Astyanax client have any problem with previous version of Cassandra?
  2) You said one problem, that it will consume more memory? Can you 
  elaborate that slightly? What do you mean by that?
  3) Does Astyanax supports asynch capabilities?
 
 
  On Tue, Apr 16, 2013 at 11:05 PM, Everton Lima peitin.inu...@gmail.com 
  wrote:
  Hi Techy,
 
  We are using Astyanax with cassandra 1.2.4.
 
  beneficits:
   * It is so easy to configure and use.
   * Good wiki
   * Mantained by Netflix
   * Solution to manage the store of big files (more than 15mb)
   * Solution to read all rows efficiently
 
  problems:
   * It consume more memory
 
 
  2013/4/16 Techy Teck comptechge...@gmail.com
  Hello,
 
  I have recently started working with Cassandra Database. Now I am in the 
  process of evaluating which Cassandra client I should go forward with.
  I am mainly interested in these three-
 
  --1)  Astyanax client
 
  2--)  New Datastax client that uses Binary protocol.
 
  --3)  Pelops client
 
 
 
  Can anyone provide some thoughts on this? Some advantages and disadvantages 
  for these three will be great start for me.
 
 
 
  Keeping in mind, we are running Cassandra 1.2.2 in production environment.
 
 
  Thanks for the help.
 
 
 
  --
  Everton Lima Aleixo
  Bacharel em Ciência da Computação pela UFG
  Mestrando em Ciência da Computação pela UFG
  Programador no LUPA
 
 
 



How to make compaction run faster?

2013-04-17 Thread Jay Svc
Hi Team,



I have a high write traffic to my Cassandra cluster. I experience a very
high number of pending compactions. As I expect higher writes, The pending
compactions keep increasing. Even when I stop my writes it takes several
hours to finishing pending compactions.


My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.


How can I increase the compaction rate so it will run bit faster to match
my write speed?


Your inputs are appreciated.


Thanks,

Jay


Re: How to make compaction run faster?

2013-04-17 Thread Edward Capriolo
three things:
1) compaction throughput is fairly low (yaml nodetool)
2) concurrent compactions is fairly low (yaml)
3) multithreaded compaction might be off in your version

Try raising these things. Otherwise consider option 4.

4)$$$ RAID,RAMCPU$$


On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc jaytechg...@gmail.com wrote:

 Hi Team,



 I have a high write traffic to my Cassandra cluster. I experience a very
 high number of pending compactions. As I expect higher writes, The pending
 compactions keep increasing. Even when I stop my writes it takes several
 hours to finishing pending compactions.


 My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.


 How can I increase the compaction rate so it will run bit faster to match
 my write speed?


 Your inputs are appreciated.


 Thanks,

 Jay




Re: How to make compaction run faster?

2013-04-17 Thread Alexis Rodríguez
:D

Jay, check if your disk(s) utilization allows you to change the
configuration the way Edward suggest. iostat -xkcd 1 will show you how much
of your disk(s) are in use.




On Wed, Apr 17, 2013 at 5:26 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 three things:
 1) compaction throughput is fairly low (yaml nodetool)
 2) concurrent compactions is fairly low (yaml)
 3) multithreaded compaction might be off in your version

 Try raising these things. Otherwise consider option 4.

 4)$$$ RAID,RAMCPU$$


 On Wed, Apr 17, 2013 at 4:01 PM, Jay Svc jaytechg...@gmail.com wrote:

 Hi Team,



 I have a high write traffic to my Cassandra cluster. I experience a very
 high number of pending compactions. As I expect higher writes, The pending
 compactions keep increasing. Even when I stop my writes it takes several
 hours to finishing pending compactions.


 My CF is configured with LCS, with sstable_size_mb=20M. My CPU is below
 20%, JVM memory usage is between 45%-55%. I am using Cassandra 1.1.9.


 How can I increase the compaction rate so it will run bit faster to match
 my write speed?


 Your inputs are appreciated.


 Thanks,

 Jay





Re: Thrift message length exceeded

2013-04-17 Thread Lanny Ripple
It's slow going finding the time to do so but I'm working on that.

We do have another table that has one or sometimes two columns per row.  We can 
run jobs on it without issue.  I looked through org.apache.cassandra.hadoop 
code and don't see anything that's really changed since 1.1.5 (which was also 
using thrift-0.7) so something of a puzzler about what's going on.


On Apr 17, 2013, at 2:47 PM, aaron morton aa...@thelastpickle.com wrote:

 Can you reproduce this in a simple way ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 18/04/2013, at 5:50 AM, Lanny Ripple la...@spotright.com wrote:
 
 That was our first thought.  Using maven's dependency tree info we verified 
 that we're using the expected (cass 1.2.3) jars
 
 $ mvn dependency:tree | grep thrift
 [INFO] |  +- org.apache.thrift:libthrift:jar:0.7.0:compile
 [INFO] |  \- org.apache.cassandra:cassandra-thrift:jar:1.2.3:compile
 
 I've also dumped the final command run by the hadoop we use (CDH3u5) and 
 verified it's not sneaking thrift in on us.
 
 
 On Tue, Apr 16, 2013 at 4:36 PM, aaron morton aa...@thelastpickle.com 
 wrote:
 Can you confirm the you are using the same thrift version that ships 1.2.3 ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Cassandra Consultant
 New Zealand
 
 @aaronmorton
 http://www.thelastpickle.com
 
 On 16/04/2013, at 10:17 AM, Lanny Ripple la...@spotright.com wrote:
 
 A bump to say I found this
 
  
 http://stackoverflow.com/questions/15487540/pig-cassandra-message-length-exceeded
 
 so others are seeing similar behavior.
 
 From what I can see of org.apache.cassandra.hadoop nothing has changed 
 since 1.1.5 when we didn't see such things but sure looks like there's a 
 bug that's slipped in (or been uncovered) somewhere.  I'll try to narrow 
 down to a dataset and code that can reproduce.
 
 On Apr 10, 2013, at 6:29 PM, Lanny Ripple la...@spotright.com wrote:
 
 We are using Astyanax in production but I cut back to just Hadoop and 
 Cassandra to confirm it's a Cassandra (or our use of Cassandra) problem.
 
 We do have some extremely large rows but we went from everything working 
 with 1.1.5 to almost everything carping with 1.2.3.  Something has 
 changed.  Perhaps we were doing something wrong earlier that 1.2.3 exposed 
 but surprises are never welcome in production.
 
 On Apr 10, 2013, at 8:10 AM, moshe.kr...@barclays.com wrote:
 
 I also saw this when upgrading from C* 1.0 to 1.2.2, and from hector 0.6 
 to 0.8
 Turns out the Thrift message really was too long.
 The mystery to me: Why no complaints in previous versions? Were some 
 checks added in Thrift or Hector?
 
 -Original Message-
 From: Lanny Ripple [mailto:la...@spotright.com] 
 Sent: Tuesday, April 09, 2013 6:17 PM
 To: user@cassandra.apache.org
 Subject: Thrift message length exceeded
 
 Hello,
 
 We have recently upgraded to Cass 1.2.3 from Cass 1.1.5.  We ran 
 sstableupgrades and got the ring on its feet and we are now seeing a new 
 issue.
 
 When we run MapReduce jobs against practically any table we find the 
 following errors:
 
 2013-04-09 09:58:47,746 INFO org.apache.hadoop.util.NativeCodeLoader: 
 Loaded the native-hadoop library
 2013-04-09 09:58:47,899 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: 
 Initializing JVM Metrics with processName=MAP, sessionId=
 2013-04-09 09:58:48,021 INFO org.apache.hadoop.util.ProcessTree: setsid 
 exited with exit code 0
 2013-04-09 09:58:48,024 INFO org.apache.hadoop.mapred.Task:  Using 
 ResourceCalculatorPlugin : 
 org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4a48edb5
 2013-04-09 09:58:50,475 INFO org.apache.hadoop.mapred.TaskLogsTruncater: 
 Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
 2013-04-09 09:58:50,477 WARN org.apache.hadoop.mapred.Child: Error 
 running child
 java.lang.RuntimeException: org.apache.thrift.TException: Message length 
 exceeded: 106
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313)
   at 
 com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
   at 
 com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
   at 
 org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:444)
   at 
 org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:460)
   at 
 org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
   at 

Re: Using an EC2 cluster from the outside.

2013-04-17 Thread Ben Bromhead
Depending on your client, disable automatic client discovery and just specify a 
list of all your nodes in your client configuration.

For more details check out 
http://xzheng.net/blogs/problem-when-connecting-to-cassandra-with-ruby/ , 
obviously this deals specifically with a ruby client but it should be 
applicable to others.

Cheers

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 18/04/2013, at 5:43 AM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Apr 17, 2013 at 12:07 PM, maillis...@gmail.com wrote:
 I have a working 3 node cluster in a single ec2 region and I need to hit it 
 from our datacenter. As you'd expect, the client gets the internal addresses 
 of the nodes back. 
 
 Someone on irc mentioned using the public IP for rpc and binding that address 
 to the box. I see that mentioned in an old list mail but I don't get exactly 
 how this is supposed to work. I could really use either a link to something 
 with explicit directions or a detailed explanation. 
 
 Should cassandra use the public IPs for everything -- listen, b'cast, and 
 rpc? What should cassandra.yaml look like? Is the idea to use the public 
 addresses for cassandra but route the requests between nodes over the lan 
 using nat? 
 
 Any help or suggestion is appreciated. 
 
 Google EC2MultiRegionSnitch.
 
 =Rob



[no subject]

2013-04-17 Thread Ertio Lew
I run cassandra on single win 8 machine for development needs. Everything
has been working fine for  several months but just today I saw this error
message in cassandra logs  all host pools were marked down.



ERROR 08:40:42,684 Error occurred during processing of message.
java.lang.StringIndexOutOfBoundsException: String index out of range:
-214741811
1
at java.lang.String.checkBounds(String.java:397)
at java.lang.String.init(String.java:442)
at
org.apache.thrift.protocol.TBinaryProtocol.readString(TBinaryProtocol
.java:339)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandr
a.java:18958)
at
org.apache.cassandra.thrift.Cassandra$Processor$batch_mutate.process(
Cassandra.java:3441)
at
org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.jav
a:2889)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run
(CustomTThreadPoolServer.java:187)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec
utor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor
.java:908)
at java.lang.Thread.run(Thread.java:662)


After restarting the server everything again worked fine.
I am curious to know what is this related to. Is this caused due to my
application putting any corrupted data?


Failed shuffle

2013-04-17 Thread David McNelis
I had a situation earlier where my shuffle failed after a hard disk drive
filled up.  I went through and disabled shuffle on the machines while
trying to get the situation resolved.  Now, while I can re-enable shuffle
on the machines, when trying to do an ls, I get a timeout.

Looking at the cassandra-shuffle code, it is trying execute this query:

SELECT token_bytes,requested_at FROM system.range_xfers

which is throwing the following error in my logs:

java.lang.AssertionError: [min(-1),max(-219851097003960625)]
at org.apache.cassandra.dht.Bounds.init(Bounds.java:41)
at org.apache.cassandra.dht.Bounds.init(Bounds.java:34)
at org.apache.cassandra.dht.Bounds.withNewRight(Bounds.java:121)
at
org.apache.cassandra.service.StorageProxy.getRangeSlice(StorageProxy.java:1172)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:132)
at
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:62)
at
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:132)
at
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:143)
at
org.apache.cassandra.thrift.CassandraServer.execute_cql3_query(CassandraServer.java:1726)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4074)
at
org.apache.cassandra.thrift.Cassandra$Processor$execute_cql3_query.getResult(Cassandra.java:4062)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)


So this causes me two major issues, first, I can't restart my dead node
because it ends up with a Concurrency exception while trying to find
relocating tokens during StorageService initialization, and I can't clear
the moves because nothing is able to read what is in that range_xfers table
(at least, I also was not able to read it through cqlsh).

I thought I could recreate the table, but system is a restricted keyspace
and it looks like I can't drop and recreate that table, and cql requires a
key for delete... and since you can't get the key without getting an
error

Is there something simple I can do that I'm just missing right now?  Right
now I can't restart nodes because of this, nor sucessfully add new nodes to
my ring.


Re: Getting error while inserting data in cassandra table using Java with JDBC

2013-04-17 Thread himanshu.joshi


On 04/18/2013 12:06 AM, aaron morton wrote:

What version are you using ?
And what JDBC driver ?

Sounds like the driver is not converting the value to bytes for you.
I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

If you are using CQL these are not relevant.

Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 18/04/2013, at 1:31 AM, himanshu.joshi himanshu.jo...@orkash.com 
mailto:himanshu.jo...@orkash.com wrote:



Hi,


When I am trying to insert the data into a table using Java with 
JDBC, I am getting the error


InvalidRequestException(why:cannot parse 'Jo' as hex bytes)

My insert quarry is:
insert into temp(id,name,value,url_id) VALUES(108, 'Aa','Jo',10);

This insert quarry is running successfully from CQLSH command prompt 
but not from the code


The quarry I have used to create the table in CQLSH is:

CREATE TABLE temp (
 id bigint PRIMARY KEY,
 dt_stamp timestamp,
 name text,
 url_id bigint,
 value text
) WITH
 bloom_filter_fp_chance=0.01 AND
 caching='KEYS_ONLY' AND
 comment='' AND
 dclocal_read_repair_chance=0.00 AND
 gc_grace_seconds=864000 AND
 read_repair_chance=0.10 AND
 replicate_on_write='true' AND
 populate_io_cache_on_flush='false' AND
 compaction={'class': 'SizeTieredCompactionStrategy'} AND
 compression={'sstable_compression': 'SnappyCompressor'};



I guess the problem may because of undefined 
key_validation_class,default_validation_class and comparator etc.

Is there any way to define these attributes using CQLSH ?
I have already tried ASSUME command but it also have not resolved the 
problem.


I am a beginner in cassandra and need your guidance.

--
Thanks  Regards,
Himanshu Joshi




Hi Aaron,

The problem is resolved now as I upgraded the version of JDBC to 1.2.2
Earlier I was using JDBC version 1.1.6 with Cassandra 1.2.2

Thanks for your guidance.

--
Thanks  Regards,
Himanshu Joshi



Re: Repair hanges on 1.1.4

2013-04-17 Thread adeel . akbar

Hi Aaron,

Thank you for your feedback. I have also installed DataStax OPS center  
and its nothing shows progress of repair. Previously every repair  
progress also shown on OPS center and once it 100%, reapir also  
completed on nodes. but now reapir is in progress on node but OPS  
center nothing shows. Secondly please find netstats and  
compactionstats results as under;


# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost netstats
Mode: NORMAL
Not sending any streams.
Not receiving any streams.
Pool NameActive   Pending  Completed
Commandsn/a 05327870
Responses   n/a 0  163271943

# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost compactionstats
pending tasks: 0
Active compaction remaining time :n/a

Regards,

Adeel Akbar

Quoting aaron morton aa...@thelastpickle.com:

The errors from Hints are not concerned with repair. Increasing the   
rpc_timeout may help with those. If it's logging about 0 hints you   
may be seeing this   
https://issues.apache.org/jira/browse/CASSANDRA-5068


How did repair hang ? Check for progress with nodetool   
compactionstats and nodetool netstats.


Cheers

-
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/04/2013, at 3:01 AM, Alexis Rodríguez   
arodrig...@inconcertcc.com wrote:



Adeel,

It may be a problem in the remote node, could you check the system.log?

Also you might want to check the rpc_timeout_in_ms in both nodes,   
maybe an increase in this parameter helps.






On Fri, Apr 12, 2013 at 9:17 AM, adeel.ak...@panasiangroup.com wrote:
Hi,

I have started repair on newly added node with -pr and this nodes   
exist on another data center. I have 5MB internet connection and   
configured setstreamthroughput 1. After some time repair goes hang   
and following meesage found in logs;


# /opt/apache-cassandra-1.1.4/bin/nodetool -h localhost ring
Address DC  RackStatus State   Load  
Effective-Ownership Token
 
169417178424467235000914166253263322299
10.0.0.3DC1 RAC1Up Normal  93.26 GB  
66.67%  0
10.0.0.4DC1 RAC1Up Normal  89.1 GB   
66.67%  56713727820156410577229101238628035242
10.0.0.15   DC1 RAC1Up Normal  72.87 GB  
66.67%  113427455640312821154458202477256070484
10.40.1.103 DC2 RAC1Up Normal  48.59 GB  
100.00% 169417178424467235000914166253263322299



 INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
HintedHandOffManager.java (line 372) Timed out replaying hints to   
/10.40.1.103; aborting further deliveries
 INFO [HintedHandoff:1] 2013-04-12 17:05:49,411   
HintedHandOffManager.java (line 390) Finished hinted handoff of 0   
rows to endpoint /10.40.1.103


Why we getting this message and how I prevent repair from this error.

Regards,

Adeel Akbar