cassandra 1.0.8 memory usage

2012-10-11 Thread Daniel Woo
Hi guys,

I am running a mini cluster with 6 nodes, recently we see very frequent
ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it takes
5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client throws
SocketTimeoutException every 3 minutes.

I checked the load, it seems well balanced, and the two nodes are running
on the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4G
heap, including 800MB young generation. We did not see any swap usage
during the GC, any idea about this?

Then I took a heap dump, it shows that 5 instances of JmxMBeanServer holds
500MB memory and most of the referenced objects are JMX mbean related, it's
kind of wired to me and looks like a memory leak.

-- 
Thanks  Regards,
Daniel


Re: unbalanced ring

2012-10-11 Thread Alain RODRIGUEZ
Tamar be carefull. Datastax doesn't recommand major compactions in
production environnement.

If I got it right, performing major compaction will convert all your
SSTables into a big one, improving substantially your reads performence, at
least for a while... The problem is that will disable minor compactions too
(because of the difference of size between this SSTable and the new ones,
if I remeber well). So your reads performance will decrease until your
others SSTable reach the size of this big one you've created or until you
run an other major compaction, transforming them into a maintenance normal
process like repair is.

But, knowing that, I still don't know if we both (Tamar and I) shouldn't
run it anyway (In my case it will greatly decrease the size of my data  133
GB - 35GB and maybe load the cluster evenly...)

Alain

2012/10/10 B. Todd Burruss bto...@gmail.com

 it should not have any other impact except increased usage of system
 resources.

 and i suppose, cleanup would not have an affect (over normal compaction)
 if all nodes contain the same data


 On Wed, Oct 10, 2012 at 12:12 PM, Tamar Fraenkel ta...@tok-media.comwrote:

 Hi!
 Apart from being heavy load (the compact), will it have other effects?
 Also, will cleanup help if I have replication factor = number of nodes?
 Thanks

  *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Wed, Oct 10, 2012 at 6:12 PM, B. Todd Burruss bto...@gmail.comwrote:

 major compaction in production is fine, however it is a heavy operation
 on the node and will take I/O and some CPU.

 the only time i have seen this happen is when i have changed the tokens
 in the ring, like nodetool movetoken.  cassandra does not auto-delete
 data that it doesn't use anymore just in case you want to move the tokens
 again or otherwise undo.

 try nodetool cleanup


 On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi,

 Same thing here:

 2 nodes, RF = 2. RCL = 1, WCL = 1.
 Like Tamar I never ran a major compaction and repair once a week each
 node.

 10.59.21.241eu-west 1b  Up Normal  133.02 GB
 50.00%  0
 10.58.83.109eu-west 1b  Up Normal  98.12 GB
  50.00%  85070591730234615865843651857942052864

 What phenomena could explain the result above ?

 By the way, I have copy the data and import it in a one node dev
 cluster. There I have run a major compaction and the size of my data has
 been significantly reduced (to about 32 GB instead of 133 GB).

 How is that possible ?
 Do you think that if I run major compaction in both nodes it will
 balance the load evenly ?
 Should I run major compaction in production ?

 2012/10/10 Tamar Fraenkel ta...@tok-media.com

 Hi!
 I am re-posting this, now that I have more data and still *unbalanced
 ring*:

 3 nodes,
 RF=3, RCL=WCL=QUORUM


 Address DC  RackStatus State   Load
 OwnsToken

 113427455640312821154458202477256070485
 x.x.x.xus-east 1c  Up Normal  24.02 GB
 33.33%  0
 y.y.y.y us-east 1c  Up Normal  33.45 GB
 33.33%  56713727820156410577229101238628035242
 z.z.z.zus-east 1c  Up Normal  29.85 GB
 33.33%  113427455640312821154458202477256070485

 repair runs weekly.
 I don't run nodetool compact as I read that this may cause the minor
 regular compactions not to run and then I will have to run compact
 manually. Is that right?

 Any idea if this means something wrong, and if so, how to solve?


 Thanks,
 *
 Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Tue, Mar 27, 2012 at 9:12 AM, Tamar Fraenkel 
 ta...@tok-media.comwrote:

 Thanks, I will wait and see as data accumulates.
 Thanks,

 *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 [image: Inline image 1]

 ta...@tok-media.com
 Tel:   +972 2 6409736
 Mob:  +972 54 8356490
 Fax:   +972 2 5612956





 On Tue, Mar 27, 2012 at 9:00 AM, R. Verlangen ro...@us2.nl wrote:

 Cassandra is built to store tons and tons of data. In my opinion
 roughly ~ 6MB per node is not enough data to allow it to become a fully
 balanced cluster.


 2012/3/27 Tamar Fraenkel ta...@tok-media.com

 This morning I have
  nodetool ring -h localhost
 Address DC  RackStatus State   Load
OwnsToken

113427455640312821154458202477256070485
 10.34.158.33us-east 1c  Up Normal  5.78 MB
 33.33%  0
 10.38.175.131   us-east 1c  Up Normal  7.23 MB
 33.33%  56713727820156410577229101238628035242
  10.116.83.10us-east 1c  Up Normal  5.02 MB
 33.33%  113427455640312821154458202477256070485

 Version is 1.0.8.


  *Tamar Fraenkel *
 Senior Software Engineer, TOK Media

 

RE: unbalanced ring

2012-10-11 Thread Viktor Jevdokimov
To run, or not to run? All this depends on use case. There're no problems 
running major compactions (we do it nightly) in one case, there could be 
problems in another. Just need to understand, how everything works.


Best regards / Pagarbiai
Viktor Jevdokimov
Senior Developer

Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com
Phone: +370 5 212 3063, Mobile: +370 650 19588, Fax +370 5 261 0453
J. Jasinskio 16C, LT-01112 Vilnius, Lithuania
Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider
What is Adform: watch this short videohttp://vimeo.com/adform/display

[Adform News] http://www.adform.com

Visit us at IAB RTB workshop
October 11, 4 pm in Sala Rossa
[iab forum] http://www.iabforum.it/iab-forum-milano-2012/agenda/11-ottobre/


Disclaimer: The information contained in this message and attachments is 
intended solely for the attention and use of the named addressee and may be 
confidential. If you are not the intended recipient, you are reminded that the 
information remains the property of the sender. You must not use, disclose, 
distribute, copy, print or rely on this e-mail. If you have received this 
message in error, please contact the sender immediately and irrevocably delete 
this message and any copies.

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Thursday, October 11, 2012 09:17
To: user@cassandra.apache.org
Subject: Re: unbalanced ring

Tamar be carefull. Datastax doesn't recommand major compactions in production 
environnement.

If I got it right, performing major compaction will convert all your SSTables 
into a big one, improving substantially your reads performence, at least for a 
while... The problem is that will disable minor compactions too (because of the 
difference of size between this SSTable and the new ones, if I remeber well). 
So your reads performance will decrease until your others SSTable reach the 
size of this big one you've created or until you run an other major compaction, 
transforming them into a maintenance normal process like repair is.

But, knowing that, I still don't know if we both (Tamar and I) shouldn't run it 
anyway (In my case it will greatly decrease the size of my data  133 GB - 35GB 
and maybe load the cluster evenly...)

Alain

2012/10/10 B. Todd Burruss bto...@gmail.commailto:bto...@gmail.com
it should not have any other impact except increased usage of system resources.

and i suppose, cleanup would not have an affect (over normal compaction) if all 
nodes contain the same data

On Wed, Oct 10, 2012 at 12:12 PM, Tamar Fraenkel 
ta...@tok-media.commailto:ta...@tok-media.com wrote:
Hi!
Apart from being heavy load (the compact), will it have other effects?
Also, will cleanup help if I have replication factor = number of nodes?
Thanks

Tamar Fraenkel
Senior Software Engineer, TOK Media
[Inline image 1]

ta...@tok-media.commailto:ta...@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956




On Wed, Oct 10, 2012 at 6:12 PM, B. Todd Burruss 
bto...@gmail.commailto:bto...@gmail.com wrote:
major compaction in production is fine, however it is a heavy operation on the 
node and will take I/O and some CPU.

the only time i have seen this happen is when i have changed the tokens in the 
ring, like nodetool movetoken.  cassandra does not auto-delete data that it 
doesn't use anymore just in case you want to move the tokens again or otherwise 
undo.

try nodetool cleanup

On Wed, Oct 10, 2012 at 2:01 AM, Alain RODRIGUEZ 
arodr...@gmail.commailto:arodr...@gmail.com wrote:
Hi,

Same thing here:

2 nodes, RF = 2. RCL = 1, WCL = 1.
Like Tamar I never ran a major compaction and repair once a week each node.

10.59.21.241eu-west 1b  Up Normal  133.02 GB   50.00%   
   0
10.58.83.109eu-west 1b  Up Normal  98.12 GB50.00%   
   85070591730234615865843651857942052864

What phenomena could explain the result above ?

By the way, I have copy the data and import it in a one node dev cluster. There 
I have run a major compaction and the size of my data has been significantly 
reduced (to about 32 GB instead of 133 GB).

How is that possible ?
Do you think that if I run major compaction in both nodes it will balance the 
load evenly ?
Should I run major compaction in production ?

2012/10/10 Tamar Fraenkel ta...@tok-media.commailto:ta...@tok-media.com
Hi!
I am re-posting this, now that I have more data and still unbalanced ring:

3 nodes,
RF=3, RCL=WCL=QUORUM


Address DC  RackStatus State   LoadOwns
Token
   
113427455640312821154458202477256070485
x.x.x.xus-east 1c  Up Normal  24.02 GB33.33%  0
y.y.y.y us-east 1c  Up Normal  33.45 GB33.33%  
56713727820156410577229101238628035242
z.z.z.zus-east 1c  Up Normal  29.85 GB33.33%  

RE: Problem while streaming SSTables with BulkOutputFormat

2012-10-11 Thread Ralph Romanos

Hello again,
I noticed that this issue happens whenever a reduce task is done (so the 
SSTable is generated) while an SSTable already generated is being streamed to 
the cluster. I think that the error is therefore caused because cassandra 
cannot queue SSTables that are streamed to the cluster.Does that make sense?
Cheers,Ralph

From: matgan...@hotmail.com
To: user@cassandra.apache.org
Subject: RE: Problem while streaming SSTables with BulkOutputFormat
Date: Tue, 9 Oct 2012 22:29:41 +





Aaron,Thank you for your answer, I tried to move to Cassandra 1.1.5, but the 
error still occurs.When I set a single task or less per hadoop node, the error 
does not happen.However, when I have more than one task on any of the nodes 
(Hadoop only node or Hadoop+Cassandra node),the error happens. When it happens, 
the task fails and is sent after a while to another node and completed. 
Ultimately, I getall my tasks done but it takes much more time.
Is it possible that streaming multiple SSTables generated from two different 
tasks done by the same node to the Cassandra cluster is thecause of this issue?
CheersRalph
 Subject: Re: Problem while streaming SSTables with BulkOutputFormat
 From: aa...@thelastpickle.com
 Date: Wed, 10 Oct 2012 10:05:13 +1300
 To: user@cassandra.apache.org
 
 Something, somewhere, at some point is breaking the connection. Sorry I 
 cannot be of more help :)
 
 Something caused the streaming to fail, which started a retry, which failed 
 because the pipe was broken. 
 
 Are there any earlier errors in the logs ? 
 Did this happen on one of the nodes that has both a task tacker and cassandra 
 ?
 
 Cheers
 
 
 On 9/10/2012, at 4:06 AM, Ralph Romanos matgan...@hotmail.com wrote:
 
  Hello,
  
  I am using BulkOutputFormat to load data from a .csv file into Cassandra. I 
  am using Cassandra 1.1.3 and Hadoop 0.20.2.
  I have 7 hadoop nodes: 1 namenode/jobtracker and 6 datanodes/tasktrackers. 
  Cassandra is installed on 4 of these 6 datanodes/tasktrackers.
  The issue happens when I have more than 1 reducer, SSTables are generated 
  in each node, however, I get the following error in the tasktracker's logs 
  when they 
  are streamed into the Cassandra cluster:
  
  Exception in thread Streaming to /172.16.110.79:1 
  java.lang.RuntimeException: java.io.EOFException
  at 
  org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
  at 
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
  Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(Unknown Source)
  at 
  org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
  at 
  org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
  at 
  org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
  at 
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  ... 3 more
  Exception in thread Streaming to /172.16.110.92:1 
  java.lang.RuntimeException: java.io.EOFException
  at 
  org.apache.cassandra.utils.FBUtilities.unchecked(FBUtilities.java:628)
  at 
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
  at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown 
  Source)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  at java.lang.Thread.run(Unknown Source)
  Caused by: java.io.EOFException
  at java.io.DataInputStream.readInt(Unknown Source)
  at 
  org.apache.cassandra.streaming.FileStreamTask.receiveReply(FileStreamTask.java:194)
  at 
  org.apache.cassandra.streaming.FileStreamTask.stream(FileStreamTask.java:181)
  at 
  org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:94)
  at 
  org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
  ... 3 more
  
   ...
  
  This is what I get in the logs of one of my Cassandra nodes:
  ERROR 16:47:34,904 Sending retry message failed, closing session.
  java.io.IOException: Broken pipe
  at sun.nio.ch.FileDispatcher.write0(Native Method)
  at sun.nio.ch.SocketDispatcher.write(Unknown Source)
  at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
  at sun.nio.ch.IOUtil.write(Unknown Source)
  at sun.nio.ch.SocketChannelImpl.write(Unknown Source)
  at java.nio.channels.Channels.writeFullyImpl(Unknown Source)
  at java.nio.channels.Channels.writeFully(Unknown Source)
  at java.nio.channels.Channels.access$000(Unknown Source)
  at java.nio.channels.Channels$1.write(Unknown Source)
  at java.io.OutputStream.write(Unknown Source)
  at java.nio.channels.Channels$1.write(Unknown Source)
   

Create column family with Composite key column via thrift API

2012-10-11 Thread Vivek Mishra
Hi,

I know one way is to execute cql query via thrift client  to create a
column family having compound primary/composite columns. But is it the only
way?

Looks like i would end up creating  own CQLTranslator/Wrapper to deal
with compound primary/composite columns!(Or may be something else in near
future). Thrift way of dealing with this is really different, as column
family metadata for such column families created on cqlsh is quite
different for Thrift!

I know, have started little late and these are basic things how
CQL/composite column works. But is there anything i am
missing/misunderstood on this part?

-Vivek


CQL Sets and Maps

2012-10-11 Thread Hiller, Dean
I was reading Brian's post

http://mail-archives.apache.org/mod_mbox/cassandra-dev/201210.mbox/%3ccajhhpg20rrcajqjdnf8sf7wnhblo6j+aofksgbxyxwcoocg...@mail.gmail.com%3E

In which he asks

 Any insight into why CQL puts that in column name?
 Where does it store the metadata related to compound key
 interpretation? Wouldn't that be a better place for that since it
 shouldn't change within a table?

I have those same questions and would like to understand how it stores stuff 
better.  For example, if PlayOrm has the following

User {
   @Embedded
Private ListEmail emails;
@Embedded
Private ListSomethingElse otherStuff
@OneToMany
Private ListOwner owners;
}

It ends up storing

rowkey: userid
= column=emails:email1Id:title, value=some email title
= column=emails:email1Id:contents, value=some contents in email really really 
long
= column=emails:email2Id:title, value=some other email
= column=owners:ownerId29,  value=null
= column=owners:ownerId57,  value=null

Basically using emails as the prefix since User can have other embedded 
objects, and using emailId as the next prefix so you can have many unique 
emails and then having each email property.  How is it actually stored when 
doing Sets and Maps in CQL??  Ideally, I would like PlayOrm to overlay on top 
of that.

Thanks,
Dean


unsubscribe

2012-10-11 Thread Chris Favero
unsubscribe



cassandra + pig

2012-10-11 Thread William Oberman
I'm wondering how many people are using cassandra + pig out there?  I
recently went through the effort of validating things at a much higher
level than I previously did(*), and found a few issues:
https://issues.apache.org/jira/browse/CASSANDRA-4748
https://issues.apache.org/jira/browse/CASSANDRA-4749
https://issues.apache.org/jira/browse/CASSANDRA-4789

In general, it seems like the widerow implementation still has rough edges.
 I'm concerned I'm not understanding why other people aren't using the
feature, and thus finding these problems.  Is everyone else just setting a
high static limit?  E.g.  LOAD 'cassandra://KEYSPACE/CF?limit=X where X =
the max size of any key?  Is everyone else using data models that result in
keys with # columns always less than 1024?  Do newer version of hadoop
consume the cassandra API in a way that work around these issues?  I'm
using CDH3 == hadoop 0.20.2, pig 0.8.1.

(*) I took a random subsample of 50,000 keys of my production data (approx
1M total key/value pairs, some keys having only a single value and some
having 1000's).  I then wrote both a pig script and simple procedural
version of the pig script.  Then I compared the results.  Obviously I
started with differences, though after locally patching my code to fix the
above 3 bugs (though, really only two issues), I now (finally) get the same
results.


Re: 1.1.1 is repair still needed ?

2012-10-11 Thread B. Todd Burruss
as of 1.0 (CASSANDRA-2034) hints are generated for nodes that timeout.

On Thu, Oct 11, 2012 at 3:55 AM, Watanabe Maki watanabe.m...@gmail.com wrote:
 Even if HH works fine, HH will not be created until the failure detector 
 marks  the node is dead.
 HH will not be created for partially timeouted mutation request ( but meets 
 CL ) also... In my understanding...


 On 2012/10/11, at 5:55, Rob Coli rc...@palominodb.com wrote:

 On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 My understanding is that the repair has to happen within gc_grace period.
 [ snip ]
 So the question is, is this still needed ? Do we even need to run nodetool
 repair ?

 If Hinted Handoff works in your version of Cassandra, and that version
 is  1.0, you should not need to repair if no node has crashed or
 been down for longer than max_hint_window_in_ms. This is because after
 1.0, any failed write to a remote replica results in a hint, so any
 DELETE should eventually be fully replicated.

 However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
 (unreleased) so you cannot rely on the above heuristic for
 consistency. In these versions, you have to repair (or read repair
 100% of keys) once every GCGraceSeconds to prevent the possibility of
 zombie data. If it were possible to repair on a per-columnfamily
 basis, you could get a significant win by only repairing
 columnfamilies which take DELETE traffic.

 https://issues.apache.org/jira/browse/CASSANDRA-4772

 =Rob

 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb


Re: cassandra + pig

2012-10-11 Thread Jeremy Hanna
The Dachis Group (where I just came from, now at DataStax) uses pig with 
cassandra for a lot of things.  However, we weren't using the widerow 
implementation yet since wide row support is new to 1.1.x and we were on 0.7, 
then 0.8, then 1.0.x.

I think since it's new to 1.1's hadoop support, it sounds like there are some 
rough edges like you say.  But issues that are reproducible on tickets for any 
problems are much appreciated and they will get addressed.

On Oct 11, 2012, at 10:43 AM, William Oberman ober...@civicscience.com wrote:

 I'm wondering how many people are using cassandra + pig out there?  I 
 recently went through the effort of validating things at a much higher level 
 than I previously did(*), and found a few issues:
 https://issues.apache.org/jira/browse/CASSANDRA-4748
 https://issues.apache.org/jira/browse/CASSANDRA-4749
 https://issues.apache.org/jira/browse/CASSANDRA-4789
 
 In general, it seems like the widerow implementation still has rough edges.  
 I'm concerned I'm not understanding why other people aren't using the 
 feature, and thus finding these problems.  Is everyone else just setting a 
 high static limit?  E.g.  LOAD 'cassandra://KEYSPACE/CF?limit=X where X = 
 the max size of any key?  Is everyone else using data models that result in 
 keys with # columns always less than 1024?  Do newer version of hadoop 
 consume the cassandra API in a way that work around these issues?  I'm using 
 CDH3 == hadoop 0.20.2, pig 0.8.1.
 
 (*) I took a random subsample of 50,000 keys of my production data (approx 1M 
 total key/value pairs, some keys having only a single value and some having 
 1000's).  I then wrote both a pig script and simple procedural version of the 
 pig script.  Then I compared the results.  Obviously I started with 
 differences, though after locally patching my code to fix the above 3 bugs 
 (though, really only two issues), I now (finally) get the same results.



Re: cassandra + pig

2012-10-11 Thread William Oberman
If you don't mind me asking, how are you handling the fact that pre-widerow
you are only getting a static number of columns per key (default 1024)?  Or
am I not understanding the limit concept?

On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna
jeremy.hanna1...@gmail.comwrote:

 The Dachis Group (where I just came from, now at DataStax) uses pig with
 cassandra for a lot of things.  However, we weren't using the widerow
 implementation yet since wide row support is new to 1.1.x and we were on
 0.7, then 0.8, then 1.0.x.

 I think since it's new to 1.1's hadoop support, it sounds like there are
 some rough edges like you say.  But issues that are reproducible on tickets
 for any problems are much appreciated and they will get addressed.

 On Oct 11, 2012, at 10:43 AM, William Oberman ober...@civicscience.com
 wrote:

  I'm wondering how many people are using cassandra + pig out there?  I
 recently went through the effort of validating things at a much higher
 level than I previously did(*), and found a few issues:
  https://issues.apache.org/jira/browse/CASSANDRA-4748
  https://issues.apache.org/jira/browse/CASSANDRA-4749
  https://issues.apache.org/jira/browse/CASSANDRA-4789
 
  In general, it seems like the widerow implementation still has rough
 edges.  I'm concerned I'm not understanding why other people aren't using
 the feature, and thus finding these problems.  Is everyone else just
 setting a high static limit?  E.g.  LOAD 'cassandra://KEYSPACE/CF?limit=X
 where X = the max size of any key?  Is everyone else using data models
 that result in keys with # columns always less than 1024?  Do newer version
 of hadoop consume the cassandra API in a way that work around these issues?
  I'm using CDH3 == hadoop 0.20.2, pig 0.8.1.
 
  (*) I took a random subsample of 50,000 keys of my production data
 (approx 1M total key/value pairs, some keys having only a single value and
 some having 1000's).  I then wrote both a pig script and simple procedural
 version of the pig script.  Then I compared the results.  Obviously I
 started with differences, though after locally patching my code to fix the
 above 3 bugs (though, really only two issues), I now (finally) get the same
 results.




unnecessary tombstone's transmission during repair process

2012-10-11 Thread Alexey Zotov
Hi Guys,

I have a question about merkle tree construction and repair process. When
mercle tree is constructing it calculates hashes. For DeletedColumn it
calculates hash using value. Value of DeletedColumn is a serialized local
deletion time. We know that local deletion time can be different on
different nodes for the same tombstone. So hashes of the same tombstone on
different nodes will be different. Is it true? I think that local deletion
time shouldn't be considered in hash's calculation.

We've provided several tests:
// we have 3 node, RF=2, CL=QUORUM. So we have strong consistency.
1. Populate data to all nodes. Run repair process. No any streams were
transmitted. It's predictable behaviour.
2. Then we removed some columns for some rows. No any nodes we down. All
writes were done successfully. We run repair. There were some streams. It's
strange for me, because all data should be consistent.

We've created some patch and applied it.
1. Result of the first test is the same.
2. Result of the second test: there were no any unnecessary streams as I
expected.


My question is:
Is transmission of the equals tombstones during repair process a feature?
:) or is it a bug?
If it's a bug, I'll create ticket and attach patch to it.


Re: can't get cqlsh running

2012-10-11 Thread Nick Bailey
It looks like easy_install is only recognizing python2.4 on your
system. It is installing the cql module for that version. The cqlsh
script explicitly looks for and runs with python2.6 since 2.4 isn't
supported.

I believe you can run 'python2.6 easy_install cql' to force it to use
that python install.

-Nick

On Thu, Oct 11, 2012 at 10:45 AM, Tim Dunphy bluethu...@gmail.com wrote:
 Hey guys,

  I'm on cassandra 1.1.5 on a centos 5.8 machine. I have the cassandra bin
 directory on my path so that i can simply type 'cassandra-cli' from anywhere
 on my path to get into the cassandra command line environment. It's great!

 But I'd like to start using the cql shell (cqlsh) but apparently I don't
 know enough about python to get this working. This is what happens when I
 try to run cqlsh:

 [root@beta:~] #cqlsh

 Python CQL driver not installed, or not on PYTHONPATH.
 You might try easy_install cql.

 Python: /usr/bin/python2.6
 Module load path: ['/usr/local/apache-cassandra-1.1.5/bin/../pylib',
 '/usr/local/apache-cassandra-1.1.5/bin',
 '/usr/local/apache-cassandra-1.1.5/bin', '/usr/lib64/python26.zip',
 '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2',
 '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old',
 '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages',
 '/usr/lib/python2.6/site-packages',
 '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info']

 Error: No module named cql

 But easy_install claims that it's already installed:

 [root@beta:~] #easy_install cql
 Searching for cql
 Best match: cql 1.0.10
 Processing cql-1.0.10-py2.4.egg
 cql 1.0.10 is already the active version in easy-install.pth

 Using /usr/lib/python2.4/site-packages/cql-1.0.10-py2.4.egg
 Processing dependencies for cql

 I'm thinking that I just don't know how to set the PYTHONPATH variable or
 where to point it to. Can someone give me a pointer here?

 Thanks
 Tim

 --
 GPG me!!

 gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B




Re: cassandra + pig

2012-10-11 Thread William Oberman
Thanks Jeremy!  Maybe figuring out how to do paging in pig would have been
easier, but I found the widerow setting first which led me where I am
today.  I don't mind helping to blaze trails, or contribute back when doing
so, but I usually try to follow rather than lead when it comes to
tools/software I choose use.  I didn't realize how close to the edge I was
getting in this case :-)

On Thu, Oct 11, 2012 at 1:03 PM, Jeremy Hanna jeremy.hanna1...@gmail.comwrote:

 For our use case, we had a lot of narrow column families and the couple of
 column families that had wide rows, we did our own paging through them.  I
 don't recall if we did paging in pig or mapreduce but you should be able to
 do that in both since pig allows you to specify the slice start.

 On Oct 11, 2012, at 11:28 AM, William Oberman ober...@civicscience.com
 wrote:

  If you don't mind me asking, how are you handling the fact that
 pre-widerow you are only getting a static number of columns per key
 (default 1024)?  Or am I not understanding the limit concept?
 
  On Thu, Oct 11, 2012 at 11:25 AM, Jeremy Hanna 
 jeremy.hanna1...@gmail.com wrote:
  The Dachis Group (where I just came from, now at DataStax) uses pig with
 cassandra for a lot of things.  However, we weren't using the widerow
 implementation yet since wide row support is new to 1.1.x and we were on
 0.7, then 0.8, then 1.0.x.
 
  I think since it's new to 1.1's hadoop support, it sounds like there are
 some rough edges like you say.  But issues that are reproducible on tickets
 for any problems are much appreciated and they will get addressed.
 
  On Oct 11, 2012, at 10:43 AM, William Oberman ober...@civicscience.com
 wrote:
 
   I'm wondering how many people are using cassandra + pig out there?  I
 recently went through the effort of validating things at a much higher
 level than I previously did(*), and found a few issues:
   https://issues.apache.org/jira/browse/CASSANDRA-4748
   https://issues.apache.org/jira/browse/CASSANDRA-4749
   https://issues.apache.org/jira/browse/CASSANDRA-4789
  
   In general, it seems like the widerow implementation still has rough
 edges.  I'm concerned I'm not understanding why other people aren't using
 the feature, and thus finding these problems.  Is everyone else just
 setting a high static limit?  E.g.  LOAD 'cassandra://KEYSPACE/CF?limit=X
 where X = the max size of any key?  Is everyone else using data models
 that result in keys with # columns always less than 1024?  Do newer version
 of hadoop consume the cassandra API in a way that work around these issues?
  I'm using CDH3 == hadoop 0.20.2, pig 0.8.1.
  
   (*) I took a random subsample of 50,000 keys of my production data
 (approx 1M total key/value pairs, some keys having only a single value and
 some having 1000's).  I then wrote both a pig script and simple procedural
 version of the pig script.  Then I compared the results.  Obviously I
 started with differences, though after locally patching my code to fix the
 above 3 bugs (though, really only two issues), I now (finally) get the same
 results.
 
 
 




unsubscribe

2012-10-11 Thread Siddiqui, Akmal
unsubscribe 


Re: unnecessary tombstone's transmission during repair process

2012-10-11 Thread Rob Coli
On Thu, Oct 11, 2012 at 8:41 AM, Alexey Zotov azo...@griddynamics.com wrote:
 Value of DeletedColumn is a serialized local
 deletion time. We know that local deletion time can be different on
 different nodes for the same tombstone. So hashes of the same tombstone on
 different nodes will be different. Is it true?

Yes, this seems correct based on my understanding of the process of
writing tombstones.

 I think that local deletion time shouldn't be considered in hash's 
 calculation.

I think you are correct; the only thing that matters is whether the
tombstone exists or not. There may be something I am missing about why
the very-unlikely-to-be-identical value should be considered a merkle
tree failure.

https://issues.apache.org/jira/browse/CASSANDRA-2279

Seems related to this issue, fwiw.

 Is transmission of the equals tombstones during repair process a feature? :)
 or is it a bug?

I think it's a bug.

 If it's a bug, I'll create ticket and attach patch to it.

Yay!

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: unnecessary tombstone's transmission during repair process

2012-10-11 Thread Sylvain Lebresne
 I have a question about merkle tree construction and repair process. When
 mercle tree is constructing it calculates hashes. For DeletedColumn it
 calculates hash using value. Value of DeletedColumn is a serialized local
 deletion time.

The deletion time time is not local to each replica, it's computed
only once by the coordinator node that received the deletion
initially.

 We know that local deletion time can be different on
 different nodes for the same tombstone.

Given the above, no it cannot.

--
Sylvain


Re: cassandra 1.0.8 memory usage

2012-10-11 Thread Jason Wee
what jvm version?

On Thu, Oct 11, 2012 at 2:04 PM, Daniel Woo daniel.y@gmail.com wrote:

 Hi guys,

 I am running a mini cluster with 6 nodes, recently we see very frequent
 ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it takes
 5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client throws
 SocketTimeoutException every 3 minutes.

 I checked the load, it seems well balanced, and the two nodes are running
 on the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4G
 heap, including 800MB young generation. We did not see any swap usage
 during the GC, any idea about this?

 Then I took a heap dump, it shows that 5 instances of JmxMBeanServer holds
 500MB memory and most of the referenced objects are JMX mbean related, it's
 kind of wired to me and looks like a memory leak.

 --
 Thanks  Regards,
 Daniel



Re: cassandra 1.0.8 memory usage

2012-10-11 Thread Rob Coli
On Wed, Oct 10, 2012 at 11:04 PM, Daniel Woo daniel.y@gmail.com wrote:
 I am running a mini cluster with 6 nodes, recently we see very frequent
 ParNewGC on two nodes. It takes 200 - 800 ms on average, sometimes it takes
 5 seconds. You know, hte ParNewGC is stop-of-wolrd GC and our client throws
 SocketTimeoutException every 3 minutes.

What version of Cassandra? What JVM? Are JNA and Jamm working?

 I checked the load, it seems well balanced, and the two nodes are running on
 the same hardware: 2 * 4 cores xeon with 16G RAM, we give cassandrda 4G
 heap, including 800MB young generation. We did not see any swap usage during
 the GC, any idea about this?

It sounds like the two nodes that are pathological right now have
exhausted the perm gen with actual non-garbage, probably mostly the
Bloom filters and the JMX MBeans.

 Then I took a heap dump, it shows that 5 instances of JmxMBeanServer holds
 500MB memory and most of the referenced objects are JMX mbean related, it's
 kind of wired to me and looks like a memory leak.

Do you have a large number of ColumnFamilies? How large is the data
stored per node?

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Re: cassandra 1.0.8 memory usage

2012-10-11 Thread Rob Coli
On Thu, Oct 11, 2012 at 11:02 AM, Rob Coli rc...@palominodb.com wrote:
 On Wed, Oct 10, 2012 at 11:04 PM, Daniel Woo daniel.y@gmail.com wrote:
  We did not see any swap usage during the GC, any idea about this?

As an aside.. you shouldn't have swap enabled on a Cassandra node,
generally. As a simple example, if you have swap enabled and use the
off-heap row cache, the kernel might swap your row cache.

=Rob

-- 
=Robert Coli
AIMGTALK - rc...@palominodb.com
YAHOO - rcoli.palominob
SKYPE - rcoli_palominodb


Perlcassa - Perl Cassandra 'Client'

2012-10-11 Thread Michael Kjellman
Hi- A few months back I wrote a Perl client for Cassandra and I realized I 
never sent it out to this list. While I realize that while Perl is not the 
language du jour hopefully this will help someone else out. :) Code is 
periodically thrown up on CPAN but look at 
http://github.com/mkjellman/perlcassa for the most current version. It supports 
serialization/deserialization of validation classes, composite columns, as well 
as connection pooling (without using ResourcePool which fails miserably when 
run under mod_perl).

Best,
michael

'Like' us on Facebook for exclusive content and other resources on all 
Barracuda Networks solutions.
Visit http://barracudanetworks.com/facebook




Re: [problem with OOM in nodes]

2012-10-11 Thread Hiller, Dean
Splitting one report to multiple rows is uncomfortably

WHY?  Reading from N disks is way faster than reading from 1 disk.

I think in terms of PlayOrm and then explain the model you can use so I
think in objects first

Report {
  String uniqueId
  String reportName; //may be indexable and query able
  String description;
  CursorToManyReportRow rows;
}

ReportRow {
  String uniqueId;
  String somedata;
  String someMoreData;
}

Each row in Report in PlayOrm is backed by two rows in the database in
this special case of using CursorToMany

ReportRow - reportName=somename, description=some desc
CursorToManyRow in index table - reportRowKey56, reportRowKey78,
reportRowKey89 (there are NO values in this row and this row can have less
than 10 million valuesÅ if your report is beyond 10 million, let me know
and I have a different design).

Then each report row is basically the same structure as above.  You can
then 

1. Read in the report
2. As you read from CursorToMany, it does a BATCH slice into the
CursorToManyRow AND then does a MULTIGET in parallel to fetch report
rows(ie. It is all in parallel so get lots of rows from many disks really
fast)
3. Print the rows out

If you have more than 10 million rows in a report, let me know.  You can
do what PlayOrm does yourself of course ;).

Later,
Dean



On 9/23/12 11:14 PM, Denis Gabaydulin gaba...@gmail.com wrote:

On Sun, Sep 23, 2012 at 10:41 PM, aaron morton aa...@thelastpickle.com
wrote:
 /var/log/cassandra$ cat system.log | grep Compacting large | grep -E
 [0-9]+ bytes -o | cut -d   -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
 print foo MB }'  | sort -nr | head -n 50


 Is it bad signal?

 Sorry, I do not know what this is outputting.


This is outputting size of big rows which cassandra had compacted before.

 As I can see in cfstats, compacted row maximum size: 386857368 !

 Yes.
 Having rows in the 100's of MB is will cause problems. Doubly so if
they are
 large super columns.


What exactly is the problem with big rows?
And, how can we should place our data in this case (see the schema in
the previous replies)? Splitting one report to multiple rows is
uncomfortably :-(


 Cheers



 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 22/09/2012, at 5:07 AM, Denis Gabaydulin gaba...@gmail.com wrote:

 And some stuff from log:


 /var/log/cassandra$ cat system.log | grep Compacting large | grep -E
 [0-9]+ bytes -o | cut -d   -f 1 |  awk '{ foo = $1 / 1024 / 1024 ;
 print foo MB }'  | sort -nr | head -n 50
 3821.55MB
 3337.85MB
 1221.64MB
 1128.67MB
 930.666MB
 916.4MB
 861.114MB
 843.325MB
 711.813MB
 706.992MB
 674.282MB
 673.861MB
 658.305MB
 557.756MB
 531.577MB
 493.112MB
 492.513MB
 492.291MB
 484.484MB
 479.908MB
 465.742MB
 464.015MB
 459.95MB
 454.472MB
 441.248MB
 428.763MB
 424.028MB
 416.663MB
 416.191MB
 409.341MB
 406.895MB
 397.314MB
 388.27MB
 376.714MB
 371.298MB
 368.819MB
 366.92MB
 361.371MB
 360.509MB
 356.168MB
 355.012MB
 354.897MB
 354.759MB
 347.986MB
 344.109MB
 335.546MB
 329.529MB
 326.857MB
 326.252MB
 326.237MB

 Is it bad signal?

 On Fri, Sep 21, 2012 at 8:22 PM, Denis Gabaydulin gaba...@gmail.com
wrote:

 Found one more intersting fact.
 As I can see in cfstats, compacted row maximum size: 386857368 !

 On Fri, Sep 21, 2012 at 12:50 PM, Denis Gabaydulin gaba...@gmail.com
 wrote:

 Reports - is a SuperColumnFamily

 Each report has unique identifier (report_id). This is a key of
 SuperColumnFamily.
 And a report saved in separate row.

 A report is consisted of report rows (may vary between 1 and 50,
 but most are small).

 Each report row is saved in separate super column. Hector based code:

 superCfMutator.addInsertion(
  report_id,
  Reports,
  HFactory.createSuperColumn(
report_row_id,
mapper.convertObject(object),
columnDefinition.getTopSerializer(),
columnDefinition.getSubSerializer(),
inferringSerializer
  )
 );

 We have two frequent operation:

 1. count report rows by report_id (calculate number of super columns
 in the row).
 2. get report rows by report_id and range predicate (get super columns
 from the row with range predicate).

 I can't see here a big super columns :-(

 On Fri, Sep 21, 2012 at 3:10 AM, Tyler Hobbs ty...@datastax.com wrote:

 I'm not 100% that I understand your data model and read patterns
correctly,
 but it sounds like you have large supercolumns and are requesting some
of
 the subcolumns from individual super columns.  If that's the case, the
issue
 is that Cassandra must deserialize the entire supercolumn in memory
whenever
 you read *any* of the subcolumns.  This is one of the reasons why
composite
 columns are recommended over supercolumns.


 On Thu, Sep 20, 2012 at 6:45 AM, Denis Gabaydulin gaba...@gmail.com
wrote:


 p.s. Cassandra 1.1.4

 On Thu, Sep 20, 2012 at 3:27 PM, Denis Gabaydulin gaba...@gmail.com
 wrote:

 Hi, all!

 We have a cluster with virtual 7 nodes (disk storage is connected to
 nodes with iSCSI). 

CRM:0267190

2012-10-11 Thread Joseph Heinzen
unsubscribe

Joseph Heinzen
Senior VP, UCC Sales
Tel. 571-297-4162 | Mobile. 703-463-7145
Fax. 703-891-1073 | jhein...@microtech.netmailto:jhein...@microtech.net | 
www.MicroTech.nethttp://www.MicroTech.net
[Description: C:\Users\joseph.heinzen\AppData\Roaming\Microsoft\Signatures\New 
Email Signature HQ v3 (Joseph Heinzen)-Image01.jpg]
The Fastest Growing Hispanic-Owned Business in the Nation (2009, 2010  2011)
8330 Boone Blvd, Suite 600, Vienna, VA 22182
A Service-Disabled Veteran-Owned Business
DISCLAIMER: The information in this email, and any attached document(s), is 
MicroTech proprietary data and intended only for recipient(s) addressed above. 
If you are not an intended recipient, you are requested to notify the sender 
above and delete any copies of this transmission. Thank you in advance for your 
cooperation. If you have any questions, please contact the sender at (703) 
891-1073.
inline: image001.jpg

Re: can't get cqlsh running

2012-10-11 Thread Tim Dunphy


 I believe you can run 'python2.6 easy_install cql' to force it to use
 that python install.


Well initially I tried going:

[root@beta:~] #python2.6 easy_install
python2.6: can't open file 'easy_install': [Errno 2] No such file or
directory

But when I used the full paths of each:

/usr/bin/python2.6 /usr/bin/easy_install cql

It worked like a charm!

[root@beta:~] #cqlsh
Connected to Test Cluster at beta.jokefire.com:9160.
[cqlsh 2.0.0 | Cassandra unknown | CQL spec unknown | Thrift protocol
19.20.0]
Use HELP for help.
cqlsh

So, thanks for your advice! That really did the trick!

Tim

On Thu, Oct 11, 2012 at 11:56 AM, Nick Bailey n...@datastax.com wrote:

 It looks like easy_install is only recognizing python2.4 on your
 system. It is installing the cql module for that version. The cqlsh
 script explicitly looks for and runs with python2.6 since 2.4 isn't
 supported.

 I believe you can run 'python2.6 easy_install cql' to force it to use
 that python install.


 -Nick

 On Thu, Oct 11, 2012 at 10:45 AM, Tim Dunphy bluethu...@gmail.com wrote:
  Hey guys,
 
   I'm on cassandra 1.1.5 on a centos 5.8 machine. I have the cassandra bin
  directory on my path so that i can simply type 'cassandra-cli' from
 anywhere
  on my path to get into the cassandra command line environment. It's
 great!
 
  But I'd like to start using the cql shell (cqlsh) but apparently I don't
  know enough about python to get this working. This is what happens when I
  try to run cqlsh:
 
  [root@beta:~] #cqlsh
 
  Python CQL driver not installed, or not on PYTHONPATH.
  You might try easy_install cql.
 
  Python: /usr/bin/python2.6
  Module load path: ['/usr/local/apache-cassandra-1.1.5/bin/../pylib',
  '/usr/local/apache-cassandra-1.1.5/bin',
  '/usr/local/apache-cassandra-1.1.5/bin', '/usr/lib64/python26.zip',
  '/usr/lib64/python2.6', '/usr/lib64/python2.6/plat-linux2',
  '/usr/lib64/python2.6/lib-tk', '/usr/lib64/python2.6/lib-old',
  '/usr/lib64/python2.6/lib-dynload', '/usr/lib64/python2.6/site-packages',
  '/usr/lib/python2.6/site-packages',
  '/usr/lib/python2.6/site-packages/setuptools-0.6c11-py2.6.egg-info']
 
  Error: No module named cql
 
  But easy_install claims that it's already installed:
 
  [root@beta:~] #easy_install cql
  Searching for cql
  Best match: cql 1.0.10
  Processing cql-1.0.10-py2.4.egg
  cql 1.0.10 is already the active version in easy-install.pth
 
  Using /usr/lib/python2.4/site-packages/cql-1.0.10-py2.4.egg
  Processing dependencies for cql
 
  I'm thinking that I just don't know how to set the PYTHONPATH variable or
  where to point it to. Can someone give me a pointer here?
 
  Thanks
  Tim
 
  --
  GPG me!!
 
  gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
 
 




-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


Re: CRM:0267190

2012-10-11 Thread Michael Shuler
On 10/11/2012 02:37 PM, Joseph Heinzen wrote:
 unsubscribe

http://wiki.apache.org/cassandra/FAQ#unsubscribe

 *Joseph Heinzen*
 Senior VP, UCC Sales
 Tel. 571-297-4162 | Mobile. 703-463-7145 
 Fax. 703-891-1073 | _jhein...@microtech.net
 mailto:jhein...@microtech.net_ | _www.MicroTech.net
 http://www.MicroTech.net_
 *Description:
 C:\Users\joseph.heinzen\AppData\Roaming\Microsoft\Signatures\New Email
 Signature HQ v3 (Joseph Heinzen)-Image01.jpg**
 *The Fastest Growing Hispanic-Owned Business in the Nation (2009, 2010 
 2011)
 8330 Boone Blvd, Suite 600, Vienna, VA 22182
 A Service-Disabled Veteran-Owned Business

http://www.ietf.org/rfc/rfc1855.txt (re: subject; signature)

 /DISCLAIMER: The information in this email, and any attached
 document(s), is MicroTech proprietary data and intended only for
 recipient(s) addressed above. If you are not an intended recipient, you
 are requested to notify the sender above and delete any copies of this
 transmission. Thank you in advance for your cooperation. If you have any
 questions, please contact the sender at (703) 891-1073./

and finally, http://www.pbandjelly.org/2011/03/to-whom-it-may-concern/

-- 
Kind regards,
Michael



Re: unsubscribe

2012-10-11 Thread Tyler Hobbs
http://wiki.apache.org/cassandra/FAQ#unsubscribe

On Thu, Oct 11, 2012 at 12:41 PM, Siddiqui, Akmal 
akmal.siddi...@broadvision.com wrote:

 unsubscribe




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Option for ordering columns by timestamp in CF

2012-10-11 Thread Tyler Hobbs
Without thinking too deeply about it, this is basically equivalent to
disabling timestamps for a column family and using timestamps for column
names, though in a very indirect (and potentially confusing) manner.  So,
if you want to open a ticket, I would suggest framing it as make column
timestamps optional.

On Wed, Oct 10, 2012 at 4:44 AM, Ertio Lew ertio...@gmail.com wrote:

 I think Cassandra should provide an configurable option on per column
 family basis to do columns sorting by time-stamp rather than column names.
 This would be really helpful to maintain time-sorted columns without using
 up the column name as time-stamps which might otherwise be used to store
 most relevant column names useful for retrievals. Very frequently we need
 to store data sorted in time order. Therefore I think this may be a very
 general requirement  not specific to just my use-case alone.

 Does it makes sense to create an issue for this ?




 On Fri, Mar 25, 2011 at 2:38 AM, aaron morton aa...@thelastpickle.comwrote:

 If you mean order by the column timestamp (as passed by the client) that
 it not possible.

 Can you use your own timestamps as the column name and store them as long
 values ?

 Aaron

 On 25 Mar 2011, at 09:30, Narendra Sharma wrote:

  Cassandra 0.7.4
  Column names in my CF are of type byte[] but I want to order columns by
 timestamp. What is the best way to achieve this? Does it make sense for
 Cassandra to support ordering of columns by timestamp as option for a
 column family irrespective of the column name type?
 
  Thanks,
  Naren





-- 
Tyler Hobbs
DataStax http://datastax.com/


RE: unsubscribe

2012-10-11 Thread Siddiqui, Akmal
http://wiki.apache.org/cassandra/FAQ#unsubscribe 

-Original Message-
From: Chris Favero [mailto:chris.fav...@tricast.com] 
Sent: Thursday, October 11, 2012 7:37 AM
To: user@cassandra.apache.org
Subject: unsubscribe

unsubscribe



RE: unsubscribe

2012-10-11 Thread Siddiqui, Akmal
thanks



From: Tyler Hobbs [mailto:ty...@datastax.com] 
Sent: Thursday, October 11, 2012 1:42 PM
To: user@cassandra.apache.org
Subject: Re: unsubscribe


http://wiki.apache.org/cassandra/FAQ#unsubscribe


On Thu, Oct 11, 2012 at 12:41 PM, Siddiqui, Akmal
akmal.siddi...@broadvision.com wrote:


unsubscribe





-- 
Tyler Hobbs
DataStax http://datastax.com/ 




Repair Failing due to bad network

2012-10-11 Thread David Koblas
I'm trying to bring up a new Datacenter - while I probably could have 
brought things up in another way I've now got a DC that has a ready 
Cassandra with keys allocated.  The problem is that I cannot get a 
repair to complete due since it appears that some part of my network 
decides to restart all connections twice a day (6am and 2pm - ok 5 
minutes before).


So when I start a repair job, it usually get's a ways into things before 
one of the nodes goes DOWN, then back up.  What I don't see is the 
repair restarting, it just stops.


Is there a workaround for this case, or is there something else I could 
be doing?


--david


Re: 1.1.1 is repair still needed ?

2012-10-11 Thread Watanabe Maki
Oh sorry. It's pretty nice to know that.


On 2012/10/12, at 0:18, B. Todd Burruss bto...@gmail.com wrote:

 as of 1.0 (CASSANDRA-2034) hints are generated for nodes that timeout.
 
 On Thu, Oct 11, 2012 at 3:55 AM, Watanabe Maki watanabe.m...@gmail.com 
 wrote:
 Even if HH works fine, HH will not be created until the failure detector 
 marks  the node is dead.
 HH will not be created for partially timeouted mutation request ( but meets 
 CL ) also... In my understanding...
 
 
 On 2012/10/11, at 5:55, Rob Coli rc...@palominodb.com wrote:
 
 On Tue, Oct 9, 2012 at 12:56 PM, Oleg Dulin oleg.du...@gmail.com wrote:
 My understanding is that the repair has to happen within gc_grace period.
 [ snip ]
 So the question is, is this still needed ? Do we even need to run nodetool
 repair ?
 
 If Hinted Handoff works in your version of Cassandra, and that version
 is  1.0, you should not need to repair if no node has crashed or
 been down for longer than max_hint_window_in_ms. This is because after
 1.0, any failed write to a remote replica results in a hint, so any
 DELETE should eventually be fully replicated.
 
 However hinted handoff is meaningfully broken between 1.1.0 and 1.1.6
 (unreleased) so you cannot rely on the above heuristic for
 consistency. In these versions, you have to repair (or read repair
 100% of keys) once every GCGraceSeconds to prevent the possibility of
 zombie data. If it were possible to repair on a per-columnfamily
 basis, you could get a significant win by only repairing
 columnfamilies which take DELETE traffic.
 
 https://issues.apache.org/jira/browse/CASSANDRA-4772
 
 =Rob
 
 --
 =Robert Coli
 AIMGTALK - rc...@palominodb.com
 YAHOO - rcoli.palominob
 SKYPE - rcoli_palominodb