Re: Does Cassandra support IBM JDK?

2013-02-09 Thread Edward Capriolo
At this point I think the project should just officially state which major and minor versions they develop and test with. If we are using 'unsafe' all over the place it is not a surprise that c* is not working on most JVMs.

Re: question on incremental backup

2013-02-09 Thread Edward Capriolo
SSTables are write once. As soon as they appear on disk they ARE flushed. This means you can safely copy them away. You should run nodetool snapshot first and copy those hardlinks, because this will guarantee that the file will not be compacted away while you are copying it. On Sat, Feb 9, 2013

Re: High read latency cluster

2013-02-08 Thread Edward Capriolo
300 GB is a lot of data for cloud machines (especially with their weaker performance in general). If you are unhappy with performance why not scale the cluster out to more servers, with that much data you are usually contending with the physics of spinning disks. Three nodes + replication factor 3

Re: Upgrade from 0.6.x to 1.2.x

2013-02-08 Thread Edward Capriolo
We did this along time ago. Besides the upgrade, the issue is that the thrift clients are completely incompatible between 0.6.x and 0.7.x thus you will have to coordinate a software release with clients as well as the Cassandra update. On Fri, Feb 8, 2013 at 8:33 AM, Sergey Leschenko

Re: Cassandra libraries for Golang

2013-02-08 Thread Edward Capriolo
AFAIK there is currently only a single language that supports the native transport Java. Go can link to c / c++ libraries. Yes no? If yes then leveraging a thrift's c generated code or whatever c libraries exist might be an option. On Fri, Feb 8, 2013 at 11:40 AM, Boris Solovyov

Re: [Aurelius] Titan/Cassandra Keyspace simply disappears

2013-02-08 Thread Edward Capriolo
Clearly this illustrates someone requested the keyspace would be dropped. Likely more of a titan issue then a cassandra one. On Fri, Feb 8, 2013 at 2:03 PM, Ron Siemens rsiem...@greatergood.com wrote: There is this curiosity in the cassandra log: can this happen as part of cassandra's

Re: DataModel Question

2013-02-07 Thread Edward Capriolo
Go day / phone instead of phone / day this way you won't have a rk growing forever . A comprise would be month / phone as the row key and then use the date time as the first part of a composite column. On Thursday, February 7, 2013, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron ! My

Re: Range Queries consistency in an inconsistent cluster.

2013-02-07 Thread Edward Capriolo
Range queries do not currently read repair, although there is a ticket on this. If you want them to be consistent do them at QUORUM, or all. But in a strange quirk since get_range_slice does not repair those operations are not eventually consistent On Thu, Feb 7, 2013 at 10:20 AM, Sergey Olefir

Re: Why CQL returns data in byte format, while Hive de-serialize and return the data in readable format

2013-02-07 Thread Edward Capriolo
In cql3 a column must be all the same type . Since cql transposes columns the only thing they can be is byte array. Cql2 is better at compact tables in. This regard. On Thursday, February 7, 2013, Dinusha Dilrukshi wrote: Hi, We are using same underlying column family and extract the data

Re: Why do Datastax docs recommend Java 6?

2013-02-06 Thread Edward Capriolo
Oracle already did this once, It was called jrockit :) http://www.oracle.com/technetwork/middleware/jrockit/overview/index.html Typically oracle acquires they technology and then the bits are merged with the standard JVM. On Wed, Feb 6, 2013 at 2:13 AM, Viktor Jevdokimov

Re: where is the UTF8Comparator code for cassandra

2013-02-05 Thread Edward Capriolo
The comparator should be defined in the UTF8Type class. On Tue, Feb 5, 2013 at 10:46 AM, Hiller, Dean dean.hil...@nrel.gov wrote: Our in-memory version has a slight different we just found out about that we want to fix in the case where we are using UTF8 sorting and our column name Is

Re: Pycassa vs YCSB results.

2013-02-05 Thread Edward Capriolo
Without stating the obvious, if you are interested in scale, then why pick python?. I did want to point out that YCSB is not even the gold standard for benchmarks using cassandra's stress you can get more ops per sec then YCSB. On Tue, Feb 5, 2013 at 1:13 PM, Pradeep Kumar Mantha

Re: CPU hotspot at BloomFilterSerializer#deserialize

2013-02-03 Thread Edward Capriolo
It is interesting the press c* got about having 2 billion columns in a row. You *can* do it but it brings to light some realities of what that means. On Sun, Feb 3, 2013 at 8:09 AM, Takenori Sato ts...@cloudian.com wrote: Hi Aaron, Thanks for your answers. That helped me get a big picture.

Re: CQL : Request did not complete within rpc_timeout

2013-02-03 Thread Edward Capriolo
Without seeing your schema it is hard to say, but in some cases ALLOW FILTERING might be considered EXPECT THIS COULD BE SLOW. It could mean the query is not hitting and index and is going to page through large amounts of data. On Sun, Feb 3, 2013 at 9:42 AM, Paul van Hoven

Re: CQL : Request did not complete within rpc_timeout

2013-02-03 Thread Edward Capriolo
within rpc_timeout. ola = offerten_log_archiv (table name) hour = stunde (column name) date = datum (column name) I hope this information makes my problem more clear. 2013/2/3 Edward Capriolo edlinuxg...@gmail.com: Without seeing your schema it is hard to say, but in some cases ALLOW

Re: CQL : Request did not complete within rpc_timeout

2013-02-03 Thread Edward Capriolo
would be recommendable then? 3. How should the query look like such that it would scale? 2013/2/3 Edward Capriolo edlinuxg...@gmail.com: Secondary indexes need at least one equality. If you want to do this at scale you might need a different design. Using WITH FILTERING and LIMIT 10

Re: initial_token

2013-02-01 Thread Edward Capriolo
better machine) This is the setup of virtual nodes. Check current datastax docs for it. On Thu, Jan 31, 2013 at 8:43 PM, Edward Capriolo edlinuxg...@gmail.com wrote: This is the bad side of changing default. There are going to be a few groups unfortunates. The first group, who only can

Re: Not enough replicas???

2013-02-01 Thread Edward Capriolo
Please include the information on how your keyspace was created. This may indicate you set the replication factor to 3, when you only have 1 node, or some similar condition. On Fri, Feb 1, 2013 at 4:57 PM, stephen.m.thomp...@wellsfargo.com wrote: I need to offer my profound thanks to this

Re: Cassandra behavior on single node

2013-02-01 Thread Edward Capriolo
You are likely hitting the point where compaction is running all the time and consuming all the weak cloud io. Ebs is not suggested for performance you should use the ephermal drives. On Friday, February 1, 2013, Marcelo Elias Del Valle wrote: Hello, I am trying to figure out why the

Re: initial_token

2013-01-31 Thread Edward Capriolo
Now by default a new partitioner is chosen Murmer3. The range of tokens used to be something like 0 - 2^127. Now the range of its tokens is -2^64 - 2^64 . You can switch back to random partitioner and follow the old instructions or try to find a new doc with the new instructions. On Thu, Jan 31,

Re: initial_token

2013-01-31 Thread Edward Capriolo
by this. On Thu, Jan 31, 2013 at 4:52 PM, Rob Coli rc...@palominodb.com wrote: On Thu, Jan 31, 2013 at 12:17 PM, Edward Capriolo edlinuxg...@gmail.com wrote: Now by default a new partitioner is chosen Murmer3. Now = as of 1.2, to be unambiguous. =Rob -- =Robert Coli AIMGTALK - rc

Re: Start token sorts after end token

2013-01-30 Thread Edward Capriolo
This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January 30, 2013, Tejas Patil tejas.patil...@gmail.com wrote: While reading data from Cassandra in map-reduce, I am getting

Re: Start token sorts after end token

2013-01-30 Thread Edward Capriolo
Fix is simply to switch to random partitioner. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: This was unexpected fallout fro the change to murmur partitioner. A jira is open but if you need map red murmers is currently out of the question. On Wednesday, January

Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-30 Thread Edward Capriolo
You really can't mix cql2 and cql3. Cql2 does not understand cql3s sparse tables. Technically it ,barfs all over the place. Cql2 is only good for contact tables. On Wednesday, January 30, 2013, Andy Cobley acob...@computing.dundee.ac.uk wrote: Well this is getting stranger, for me with this

Re: JDBC, Select * Cql2 vs Cql3 problem ?

2013-01-30 Thread Edward Capriolo
Darn auto correct cql2 , is only good for compact tables. Make sure you are setting you cql version. Or frankly just switch to Hector / thrift and use things that are know to work for years now. On Wednesday, January 30, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: You really can't mix

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Edward Capriolo
Any query is going to fail quorum + rf3 + 2 nodes down. One thing about 2x indexes (both user defined and built in) is that finding an answer using them requires more nodes to be up then just a single get or slice. On Monday, January 28, 2013, Mike Sample mike.sam...@gmail.com wrote: Thanks

Re: Poor key cache hit rate

2013-01-30 Thread Edward Capriolo
You should not use the row cache and the key vacumed on the same cf. If that is what you are doing it explains your numbers. Some docs suggest you can use them together but in practice I have seen when this is done the key cache rate drops to near 0. On Tuesday, January 29, 2013, Keith

Re: Node selection when both partition key and secondary index field constrained?

2013-01-30 Thread Edward Capriolo
into that….I know at some point, I plan to. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Wednesday, January 30, 2013 7:31 AM

Suggestion: Move some threads to the client-dev mailing list

2013-01-30 Thread Edward Capriolo
A good portion of people and traffic on this list is questions about: 1) asytnax 2) cassandra-jdbc 3) cassandra native client 3) pyhtondra / whatever With the exception of the native transport which is only half way part of Cassandra, none of the these other client issues have much to do with

Re: why set replica placement strategy at keyspace level ?

2013-01-30 Thread Edward Capriolo
That should not bother you. For example, if your doing an hbase scan that crosses two column families, that count end up being two (disk) seeks. Having an API that hides the seeks from you does not give you better performance, it only helps you when your debating with people that do not

Re: ConfigHelper.setThriftContact() undefined in cassandra v1.2

2013-01-29 Thread Edward Capriolo
About as definitive as the word maybe. Oreilys seo keeps it close to top of search results but it probably not the think you want. On Tuesday, January 29, 2013, aaron morton aa...@thelastpickle.com wrote: I am trying out the example given in Cassandra Definitive guide, Ch 12. That book may be

Re: problem with Cassandra map-reduce support

2013-01-29 Thread Edward Capriolo
http://archive.apache.org/dist/hadoop/core/ has older releases. On Tue, Jan 29, 2013 at 8:08 PM, Tejas Patil tejas.patil...@gmail.comwrote: I really really need this running. I cannot get hadoop-0.20.2 tarball from apache hadoop project website. Is there any place where I can get it ?

Re: Denormalization

2013-01-27 Thread Edward Capriolo
One technique is on the client side you build a tool that takes the even and produces N mutations. In c* writes are cheap so essentially, re-write everything on all changes. On Sun, Jan 27, 2013 at 4:03 PM, Fredrik Stigbäck fredrik.l.stigb...@sitevision.se wrote: Hi. Since denormalized data

Re: Denormalization

2013-01-27 Thread Edward Capriolo
LOVE the performance of our ACL checks. Ps. 30,000 writes in cassandra is not cheap when done from one server ;) but in general parallized writes is very fast for like 500. Later, Dean From: Edward Capriolo edlinuxg...@gmail.commailto:edlinuxg...@gmail.com Reply-To: user

Re: Issue when deleting Cassandra rowKeys.

2013-01-26 Thread Edward Capriolo
Make sure the timestamp on your delete is then timestamp of the data. On Sat, Jan 26, 2013 at 1:33 PM, Kasun Weranga kas...@wso2.com wrote: Hi all, When I delete some rowkeys programmatically I can see two rowkeys remains in the column family. I think it is due to tombstones. Is there a way

Re: Large commit log reasons

2013-01-23 Thread Edward Capriolo
By default Cassandra uses 1/3rd heap size for memtable storage. If you make sure memtables smaller they should flush faster and you commit logs should not grow large. Large commit logs are not a problem, some use cases that write to some Column Families more then other can make the commit log

Re: Large commit log reasons

2013-01-23 Thread Edward Capriolo
1. The commit log is only read on startup. W: If writes are unflushed then the commit logs need to be replayed 2: shrink the memtable settings. but you dont want to do this. 3. Commit log size is not directly related to sstable size. E.g. if you write the same row a billion times the commit log

Re: Is this how to read the output of nodetool cfhistograms?

2013-01-22 Thread Edward Capriolo
This was described in good detail here: http://thelastpickle.com/2011/04/28/Forces-of-Write-and-Read/ On Tue, Jan 22, 2013 at 9:41 AM, Brian Tarbox tar...@cabotresearch.comwrote: Thank you! Since this is a very non-standard way to display data it might be worth a better explanation in the

Re: Key-hash based node selection

2013-01-19 Thread Edward Capriolo
You can not be /mostly/ consistent readlike you can not be half-pregnant or half transactional. You either are or you are not. If you do not have enough nodes for a QUORUM the read fails. Thus you never get stale reads you only get failed reads. The dynamic snitch makes reads sticky at READ.ONE.

Re: Cassandra Consistency problem with NTP

2013-01-17 Thread Edward Capriolo
If you have 40ms NTP drift something is VERY VERY wrong. You should have a local NTP server on the same subnet, do not try to use one on the moon. On Thu, Jan 17, 2013 at 4:42 AM, Sylvain Lebresne sylv...@datastax.comwrote: So what I want is, Cassandra provide some information for client, to

Re: Cassandra Performance Benchmarking.

2013-01-17 Thread Edward Capriolo
Wow you managed to do a load test through the cassandra-cli. There should be a merit badge for that. You should use the built in stress tool or YCSB. The CLI has to do much more string conversion then a normal client would and it is not built for performance. You will definitely get better

Re: trying to use row_cache (b/c we have hot rows) but nodetool info says zero requests

2013-01-16 Thread Edward Capriolo
You have to change the column family cache info from keys_only to otherwise the cache will not br on for this cf. On Wednesday, January 16, 2013, Brian Tarbox tar...@cabotresearch.com wrote: We have quite wide rows and do a lot of concentrated processing on each row...so I thought I'd try the

Re: Starting Cassandra

2013-01-16 Thread Edward Capriolo
I think at this point cassandra startup scripts should reject versions since cassandra won't even star with many jvms at this point. On Tuesday, January 15, 2013, Michael Kjellman mkjell...@barracuda.com wrote: Do yourself a favor and get a copy of the Oracle 7 JDK (now with more security

Re: Starting Cassandra

2013-01-10 Thread Edward Capriolo
I think 1.6.0_24 is too low and 1.7.0 is too high. Try a more recent 1.6. I just had problems with 1.6.0_23 see here: https://issues.apache.org/jira/browse/CASSANDRA-4944 On Thu, Jan 10, 2013 at 10:29 AM, Sloot, Hans-Peter hans-peter.sl...@atos.net wrote: I have 4 vm's with 1024M memory. 1

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
I ask myself this every day. CQL3 is new way to do things, including wide rows with collections. There is no upgrade path. You adopt CQL3's sparse tables as soon as you start creating column families from CQL. There is not much backwards compatibility. CQL3 can query compact tables, but you may

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
By no upgrade path I mean to say if I have a table with compact storage I can not upgrade it to sparse storage. If i have an existing COMPACT table and I want to add a Map to it, I can not. This is what I mean by no upgrade path. Column families that mix static and dynamic columns are pretty

Re: Wide rows in CQL 3

2013-01-09 Thread Edward Capriolo
, that do not bother me anyway. 4 are these sparse columns also taking memtable space? This questions would give me serious pause to use sparse tables On Wednesday, January 9, 2013, Edward Capriolo edlinuxg...@gmail.com wrote: By no upgrade path I mean to say if I have a table with compact storage

Re: about validity of recipe A node join using external data copy methods

2013-01-08 Thread Edward Capriolo
at 7:27 AM, DE VITO Dominique dominique.dev...@thalesgroup.com wrote: Hi, Edward Capriolo described in his Cassandra book a faster way [1] to start new nodes if the cluster size doubles, from N to 2 *N. It's about splitting in 2 parts each token range taken in charge, after the split

Re: about validity of recipe A node join using external data copy methods

2013-01-08 Thread Edward Capriolo
to do it this way anymore I guess it's true in v1.2. Is it true also in v1.1 ? Thanks. Dominique *De :* Edward Capriolo [mailto:edlinuxg...@gmail.com] *Envoyé :* mardi 8 janvier 2013 16:01 *À :* user@cassandra.apache.org *Objet :* Re: about validity of recipe A node join using

Re: help turning compaction..hours of run to get 0% compaction....

2013-01-07 Thread Edward Capriolo
There is some point where you simply need more machines. On Mon, Jan 7, 2013 at 5:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: Right, I guess I'm saying that you should try loading your data with leveled compaction and see how your compaction load is. Your work load sounds like

Re: Specifying initial token in 1.2 fails

2013-01-04 Thread Edward Capriolo
Yes. They were really just introduced and if you are ready to hitch your wagon to every new feature you put yourself in considerable risk. With any piece of software not just Cassandra. On Fri, Jan 4, 2013 at 11:59 AM, Alain RODRIGUEZ arodr...@gmail.com wrote: But I don't really get the point

Re: RandomPartitioner to Murmur3Partitioner

2013-01-03 Thread Edward Capriolo
By the way 10% faster does not necessarily mean 10% more requests. https://issues.apache.org/jira/browse/CASSANDRA-2975 https://issues.apache.org/jira/browse/CASSANDRA-3772 Also if you follow the tickets My tests show that Murmur3Partitioner actually is worse than MD5 with high cardinality

Re: Error after 1.2.0 upgrade

2013-01-03 Thread Edward Capriolo
Just a shot in the dark, but I would try setting -Xss higher then the default. It's probably like 180, but I cant even start at that level, bumped it up to 256 for JDK 7. On Thu, Jan 3, 2013 at 12:02 PM, Michael Kjellman mkjell...@barracuda.comwrote: :) yes, I'm crazy The assertion appears to

Re: Error after 1.2.0 upgrade

2013-01-03 Thread Edward Capriolo
been fixed in 1.1.7 ?? From: Edward Capriolo edlinuxg...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, January 3, 2013 11:57 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Error after 1.2.0 upgrade There is a bug

Re: Force data to a specific node

2013-01-02 Thread Edward Capriolo
There is a crazy, very bad, don't do it way to do this. You can set RF=1 and hack the LocalPartitioner (because the local partitioner has been made not to do this) Then the node you connect to and write is the node the data will get stored on. Its like memcache do it yourself style sharding.

Re: State of Cassandra and Java 7

2012-12-23 Thread Edward Capriolo
This what versions are supported is kinda up to you for example earlier versions of jdk now have bugs. I have a version of java 1.6.0_23 I believe that will not even start with the latest cassandra releases. Likewise people suggest not running the newest ones 1.7.0 because they have not tested it.

Re: how to create a keyspace in CQL3

2012-12-23 Thread Edward Capriolo
Unfortunately one of the first command everyone needs to use to use to work with cassandra changes very often. You can use cqlsh help create_keyspace; But some times even the documentation is not in line. Using this permutation of goodness: cqlsh 2.3.0 | Cassandra 1.2.0-beta2-SNAPSHOT | CQL

Re: thrift client can't add a column back after it was deleted with cassandra-cli?

2012-12-21 Thread Edward Capriolo
The cli using microsecond precision your client might be using something else and the insert with lower timestamps are dropped. On Friday, December 21, 2012, Qiaobing Xie qiaobing@gmail.com wrote: Hi, I am developing a thrift client that inserts and removes columns from a column-family

Re: Correct way to design a cassandra database

2012-12-21 Thread Edward Capriolo
You could store the order as the first part of a composite string say first picture as A and second as B. To insert one between call it AA. If you shuffle alot the strings could get really long. Might be better to store the order in a separate column. Neither solution mentioned deals with

Re: Monitoring the number of client connections

2012-12-19 Thread Edward Capriolo
In the TCP mib for SNMP (Simple Network Management Protocol) this information is available http://www.simpleweb.org/ietf/mibs/mibSynHiLite.php?category=IETFmodule=TCP-MIB On Wed, Dec 19, 2012 at 12:22 AM, Michael Kjellman mkjell...@barracuda.comwrote: netstat + cron is your friend at this

Re: rpc_timeout exception while inserting

2012-12-18 Thread Edward Capriolo
CQL2 and CQL3 indexes are not compatible. I guess CQL2 is able to detect that the table was defined in CQL3 probably should not allow it. Backwards comparability is something the storage engines and interfaces have to account for. At least they should prevent you from hurting yourself. But do not

Re: Read operations resulting in a write?

2012-12-17 Thread Edward Capriolo
Is there a way to turn this on and off through configuration? I am not necessarily sure I would want this feature. Also it is confusing if these writes show up in JMX and look like user generated write operations. On Mon, Dec 17, 2012 at 10:01 AM, Mike mthero...@yahoo.com wrote: Thank you

Re: Why Secondary indexes is so slowly by my test?

2012-12-13 Thread Edward Capriolo
Until the secondary indexes do not read before write is in a release and stabilized you should follow Ed ENuff s blog and do your indexing yourself with composites. On Thursday, December 13, 2012, aaron morton aa...@thelastpickle.com wrote: The IndexClause for the get_indexed_slices takes a

Re: Datastax C*ollege Credit Webinar Series : Create your first Java App w/ Cassandra

2012-12-13 Thread Edward Capriolo
It should be good stuff. Brian eats this stuff for lunch. On Wednesday, December 12, 2012, Brian O'Neill b...@alumni.brown.edu wrote: FWIW -- I'm presenting tomorrow for the Datastax C*ollege Credit Webinar Series:

Re: Help on MMap of SSTables

2012-12-13 Thread Edward Capriolo
This issue has to be looked from a micro and macro level. On the microlevel the best way is workload specific. On the macro level this mostly boils down to data and memory size. Companions are going to churn cache, this is unavoidable. Imho solid state makes the micro optimization meanless in the

Re: Why Secondary indexes is so slowly by my test?

2012-12-13 Thread Edward Capriolo
Here is a good start. http://www.anuff.com/2011/02/indexing-in-cassandra.html On Thu, Dec 13, 2012 at 11:35 AM, Alain RODRIGUEZ arodr...@gmail.comwrote: Hi Edward, can you share the link to this blog ? Alain 2012/12/13 Edward Capriolo edlinuxg...@gmail.com Ed ENuff s

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-10 Thread Edward Capriolo
Assuming you need to work with quorum in a non-vnode scenario. That means that if 2 nodes in a row in the ring are down some number of quorum operations will fail with UnavailableException (TimeoutException right after the failures). This is because the for a given range of tokens quorum will be

Re: Virtual Nodes, lots of physical nodes and potentially increasing outage count?

2012-12-07 Thread Edward Capriolo
Good point . hadoop sprays its blocks around randomly. Thus if replication factor nodes are down some blocks are not found. The larger the cluster the higher chance nodes are down. To deal with this increase rf once the cluster gets to be very large. On Wednesday, December 5, 2012, Eric Parusel

Re: What is substituting keys_cached column family argument

2012-12-06 Thread Edward Capriolo
Rob, Have you played with this I have many CFs, some big some small some using large caches some using small ones, some that take many requests, some that take a few. Over time I have cooked up a strategy for how to share the cache love, even thought it may not be the best solution to the

Re: Freeing up disk space on Cassandra 1.1.5 with Size-Tiered compaction.

2012-12-06 Thread Edward Capriolo
http://wiki.apache.org/cassandra/LargeDataSetConsiderations On Thu, Dec 6, 2012 at 9:53 AM, Poziombka, Wade L wade.l.poziom...@intel.com wrote: “Having so much data on each node is a potential bad day.” ** ** Is this discussed somewhere on the Cassandra documentation (limits,

Re: Row caching + Wide row column family == almost crashed?

2012-12-02 Thread Edward Capriolo
Row cache has to store the entire row. It is a very bad option for wide rows. On Sunday, December 2, 2012, Mike mthero...@yahoo.com wrote: Hello, We recently hit an issue within our Cassandra based application. We have a relatively new Column Family with some very wide rows (10's of

Re: Rename cluster

2012-11-29 Thread Edward Capriolo
Since the cluster name is only cosmetic people do not often change it. I would not do this in a production cluster for sure. On Thu, Nov 29, 2012 at 2:56 PM, Wei Zhu wz1...@yahoo.com wrote: Hi, I am trying to rename a cluster by following the instruction on Wiki: Cassandra says ClusterName

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo
, 2012 at 3:21 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I mispoke really. It is not dangerous you just have to understand what it means. this jira discusses it. https://issues.apache.org/jira/browse/CASSANDRA-3868 Per Sylvain on the referenced ticket : I don't disagree about

Re: counters + replication = awful performance?

2012-11-28 Thread Edward Capriolo
with Cassandra replication (possibly as simple as me misconfiguring something) -- it shouldn't be three times faster to write to two separate nodes in parallel as compared to writing to 2-node Cassandra cluster with replication=2. Edward Capriolo wrote Say you are doing 100 inserts rf1 on two

Re: Java high-level client

2012-11-28 Thread Edward Capriolo
Astyanax is a hector fork. You can see many of the hector' authors comments still in the astyanax code. There is some nice stuff in there but (IMHO) I do not see the fork as necessary. It has split up the community a bit, as there are now 3 high level Java clients. I would advice follow Josh's

Re: Other problem in update

2012-11-27 Thread Edward Capriolo
I am just taking a stab at this one. UUID's interact with system time and maybe your real time os is doing something funky there. The other option, which seems more likely, is that your unit tests are not cleaning up their data directory and there is some corrupt data in there. On Tue, Nov 27,

Re: Java high-level client

2012-11-27 Thread Edward Capriolo
Hector does not require an outdated version of thift, you are likely using an outdated version of hector. Here is the long and short of it: If the thrift thrift API changes then hector can have compatibility issues. This happens from time to time. The main methods like get() and insert() have

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo
The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so going from RF=1 to RF=2 means double the load on both nodes. You may want to experiment with the very dangerous column family attribute: - replicate_on_write:

Re: selective replication of keyspaces

2012-11-27 Thread Edward Capriolo
You can do something like this: Divide your nodes up into 4 datacenters art1,art2,art3,core [default@unknown] create keyspace art1 placement_strategy = 'org.apache.cassandra.locator.NetworkTopologyStrategy' and strategy_options=[{art1:2,core:2}]; [default@unknown] create keyspace art2

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo
'replicate_on_write: false' fixes the performance issue in our tests. How dangerous is it? What exactly could go wrong? On 12-11-27 01:44 PM, Edward Capriolo wrote: The difference between Replication factor =1 and replication factor 1 is significant. Also it sounds like your cluster is 2 node so

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo
performance by simply writing to two separate clusters rather than using single cluster with replicate=2. Which is kind of stupid :) I think something's fishy with counters and replication. Edward Capriolo wrote I mispoke really. It is not dangerous you just have to understand what it means

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo
By the way the other issues you are seeing with replicate on write at false could be because you did not repair. You should do that when changing rf. On Tuesday, November 27, 2012, Edward Capriolo edlinuxg...@gmail.com wrote: Cassandra's counters read on increment. Additionally

Re: counters + replication = awful performance?

2012-11-27 Thread Edward Capriolo
in parallel rather than rely on Cassandra replication. And yes, Rainbird was the inspiration for what we are trying to do here :) Edward Capriolo wrote Cassandra's counters read on increment. Additionally they are distributed so that can be multiple reads on increment. If they are not fast enough

Re: selective replication of keyspaces

2012-11-27 Thread Edward Capriolo
it couldn't be done. When I run the command I get the error syntax error at position 21: missing EOF at 'placement_strategy' that is probably because I still need to set the correct properties in the conf files On November 27, 2012 at 5:41 PM Edward Capriolo edlinuxg...@gmail.com wrote

Re: Generic questions over Cassandra 1.1/1.2

2012-11-27 Thread Edward Capriolo
@Bill Are you saying that now cassandra is less schema less ? :) Compact storage is the schemaless of old. On Tuesday, November 27, 2012, Bill de hÓra b...@dehora.net wrote: I'm not sure I always understand what people mean by schema less exactly and I'm curious. For 'schema less', given

Re: Query regarding SSTable timestamps and counts

2012-11-20 Thread Edward Capriolo
On Tue, Nov 20, 2012 at 5:23 PM, aaron morton aa...@thelastpickle.com wrote: My understanding of the compaction process was that since data files keep continuously merging we should not have data files with very old last modified timestamps It is perfectly OK to have very old SSTables. But

Re: Collections, query for contains?

2012-11-19 Thread Edward Capriolo
This was my first question after I git the inserts working. Hive has udfs like array contains. It also has lateral view syntax that is similar to transposed. On Monday, November 19, 2012, Timmy Turner timm.t...@gmail.com wrote: Is there no option to query for the contents of a collection?

Re: SchemaDisagreementException

2012-11-19 Thread Edward Capriolo
even if you made the calls through cql you would have the same issue since cql uses thrift. 1.2:0 is supposed to be nicer with concurrent modifications. On Monday, November 19, 2012, Everton Lima peitin.inu...@gmail.com wrote: I was using cassandra direct because it has more performace than

Re: SchemaDisagreementException

2012-11-19 Thread Edward Capriolo
http://www.acunu.com/2/post/2011/12/cql-benchmarking.html Last I checked, thrift still had an edge over cql due to string serialization and de serialization. Might be even more dramatic for later columns. Not that client speed matters much overall in cassandra's speed, but CQL client does more.

Re: Offsets and Range Queries

2012-11-15 Thread Edward Capriolo
There are several reasons. First there is no absolute offset. The rows are sorted by the data. If someone inserts new data between your query and this query the rows have changed. Unless you doing select queries inside a transaction with repeatable read and your database supports this the query

Re: unable to read saved rowcache from disk

2012-11-15 Thread Edward Capriolo
. (if the key is Long, could be more than 1M rows) Thanks. -Wei From: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Tuesday, November 13, 2012 11:13 PM Subject: Re: unable to read saved rowcache from disk http://wiki.apache.org

Re: Question regarding the need to run nodetool repair

2012-11-15 Thread Edward Capriolo
On Thursday, November 15, 2012, Dwight Smith dwight.sm...@genesyslab.com wrote: I have a 4 node cluster, version 1.1.2, replication factor of 4, read/write consistency of 3, level compaction. Several questions. 1) Should nodetool repair be run regularly to assure it has completed before

Re: Admin for cassandra?

2012-11-15 Thread Edward Capriolo
We should build an eclipse plugin named Eclipsandra or something. On Thu, Nov 15, 2012 at 9:45 PM, Wz1975 wz1...@yahoo.com wrote: Cqlsh is probably the closest you will get. Or pay big bucks to hire someone to develop one for you:) Thanks. -Wei Sent from my Samsung smartphone on ATT

Re: unable to read saved rowcache from disk

2012-11-13 Thread Edward Capriolo
Yes the row cache could be incorrect so on startup cassandra verify they saved row cache by re reading. It takes a long time so do not save a big row cache. On Tuesday, November 13, 2012, Manu Zhang owenzhang1...@gmail.com wrote: I have a rowcache provieded by SerializingCacheProvider. The data

Re: Read during digest mismatch

2012-11-13 Thread Edward Capriolo
I think the code base does not benefit from having too many different read code paths. Logically what your suggesting is reasonable, but you have to consider the case of one being slow to respond. Then what? On Tuesday, November 13, 2012, Manu Zhang owenzhang1...@gmail.com wrote: If consistency

Re: unable to read saved rowcache from disk

2012-11-13 Thread Edward Capriolo
is not big. On Wed, Nov 14, 2012 at 10:38 AM, Edward Capriolo edlinuxg...@gmail.com wrote: Yes the row cache could be incorrect so on startup cassandra verify they saved row cache by re reading. It takes a long time so do not save a big row cache. On Tuesday, November 13, 2012, Manu Zhang

Re: removing SSTABLEs

2012-11-12 Thread Edward Capriolo
Because you did a major compaction that table is larger then all the rest. So it will never go away until you have 3 other tables about that size or you run major compaction again. You should vote on the ticket: https://issues.apache.org/jira/browse/CASSANDRA-4766 On Mon, Nov 12, 2012 at 11:51

Re: CREATE COLUMNFAMILY

2012-11-11 Thread Edward Capriolo
If you supply metadata cassandra can use it for several things. 1) It validates data on insertion 2) Helps display the information in human readable formats in tools like the CLI and sstabletojson 3) If you add a built-in secondary index the type information is needed, strings sort differently

Re: removing SSTABLEs

2012-11-11 Thread Edward Capriolo
If you shutdown c* and remove an sstable (and it associated data, index, bloom filter , and etc) files it is safe. I would delete any saved caches as well. It is safe in the sense that Cassandra will start up with no issues, but you could be missing some data. On Sun, Nov 11, 2012 at 11:09 PM,

Re: leveled compaction and tombstoned data

2012-11-10 Thread Edward Capriolo
No it does not exist. Rob and I might start a donation page and give the money to whoever is willing to code it. If someone would write a tool that would split an sstable into 4 smaller sstables (even an offline command line tool) I would paypal them a hundo. On Sat, Nov 10, 2012 at 1:10 PM,

<    1   2   3   4   5   6   7   8   >