Scrub on secondary indexes

2013-09-01 Thread Boris Yen
Hi, We are running cassandra 1.0.12. From time to time, we see log message like *java.io.IOError: java.io.IOException: dataSize of 71530420 starting at 587 would be larger than file {cf name} ...* inside system.log. If the cf name is not for secondary index, running scrub seems to prevent the

Re: row cache

2013-08-22 Thread Boris Yen
If you are using off-heap memory for row cache, all writes invalidate the entire row should be correct. Boris On Fri, Aug 23, 2013 at 8:32 AM, Robert Coli rc...@eventbrite.com wrote: On Wed, Aug 14, 2013 at 10:56 PM, Faraaz Sareshwala fsareshw...@quantcast.com wrote: - All writes

Re: Decommission faster than bootstrap

2013-08-22 Thread Boris Yen
We are using 1.0. Our observation is that if you are using secondary index, building secondary index after streaming is time consuming. And the bootstrap needs to wait for the process of building secondary indexes to complete. I am not sure if this also applies to 1.1/1.2. You could set the log

Major compaction does not seems to free the disk space a lot if wide rows are used.

2013-05-16 Thread Boris Yen
Hi All, Sorry for the wide distribution. Our cassandra is running on 1.0.10. Recently, we are facing a weird situation. We have a column family containing wide rows (each row might have a few million of columns). We delete the columns on a daily basis and we also run major compaction on it

Re: Rename failed while cassandra is starting up

2013-04-14 Thread Boris Yen
updated, but in 1.0 that's a serialised object and not easy to poke. Hope that helps. - Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 14/04/2013, at 3:50 PM, Boris Yen yulin...@gmail.com wrote: Hi All, Recently

Re: MeteredFlusher in system.log entries

2012-07-07 Thread Boris Yen
I am not sure, but I think there should be only 6 memtables (max) based on the example. 1 is active, 4 are in the queue, 1 is being flushed. Is this correct? On Wed, Jun 6, 2012 at 9:08 PM, rohit bhatia rohit2...@gmail.com wrote: Also, Could someone please explain how the factor of 7 comes in

Re: Not getting all data from a 2 node cluster

2012-07-07 Thread Boris Yen
My guess is your RF is 1. When the new node joins the cluster, only part (depends on the token) of the data goes to this new node. On Fri, Jun 8, 2012 at 2:49 PM, Prakrati Agrawal prakrati.agra...@mu-sigma.com wrote: Dear all ** ** I am using Cassandra to retrieve a number of rows and

Re: exception when cleaning up...

2012-05-22 Thread Boris Yen
Hi Aaron, Rob, Thanks for the information, I will try it. Regards, Boris On Tue, May 22, 2012 at 11:47 PM, Rob Coli rc...@palominodb.com wrote: On Tue, May 22, 2012 at 3:00 AM, aaron morton aa...@thelastpickle.com wrote: 1) Isolating the node from the cluster to stop write activity. You

exception when cleaning up...

2012-05-20 Thread Boris Yen
Hi, We are currently running 0.8.10 with 4 nodes. We tried to re-balance the range each node owns by using nodetool move. After moving the node to the assigned token, we run the cleanup command, then we saw the exceptions: Error occured during cleanup java.util.concurrent.ExecutionException:

Re: Is nodetool upgradesstables a necessary step for upgrading from 0.8 to 1.0.

2012-05-16 Thread Boris Yen
On 16/05/2012, at 2:09 PM, Boris Yen wrote: Hi, Our cluster is currently running on 0.8.10, we plan on upgrading it to 1.0.x. We read the document from the datastax website, the final step is to use nodetool upgradesstables. Since this command might take time to finish, I wonder if this step

Re: best practices for simulating transactions in Cassandra

2011-12-15 Thread Boris Yen
I am not sure if this is the right thread to ask about this. I read that some people are using cage+zookeeper. I was wondering if anyone evaluates https://github.com/Netflix/curator? this seems to be a versatile package. On Tue, Dec 13, 2011 at 6:06 AM, John Laban j...@pagerduty.com wrote: Ok,

Re: Atomic Operations in Cassandra

2011-12-11 Thread Boris Yen
Hi Sylvain, Writes under the same row key are atomic (*even across column families*) in the sense that they are either all persisted or none are. Is this new feature for 1.x, or it also applies to previous version of Cassandra? Boris On Thu, Dec 8, 2011 at 6:40 PM, Sylvain Lebresne

Re: DyanmicCompositeType bug?

2011-12-05 Thread Boris Yen
, 2011 at 8:19 AM, Boris Yen yulin...@gmail.com wrote: Hi, I am using 0.8.7. I was trying to use DynamicComposite column. After I intentional added a column (string:string:uuid) into a record which has previous columns inserted with comparator (string:uuid:uuid). I got an exception

Re: Re: Cassandra DataModeling recommendations

2011-12-05 Thread Boris Yen
I think most of the book for cassandra are outdated, try to get information from http://www.datastax.com/docs/1.0/index As for ttl, you could read http://www.datastax.com/dev/blog/whats-new-cassandra-07-expiring-columns for more information. for composite type, you could read

Re: Cassandra DataModeling recommendations

2011-12-03 Thread Boris Yen
Not sure I understand your use case, but I think you could use a composite column instead of composite key. For example, UserID:{ TimeUUID1:CartID1, TimeUUID2:CartID2, TimeUUID3:CartID3, } This way, you could do a slice query on the time if you do not need all the carts, and you

Re: Efficiency of Cross Data Center Replication...?

2011-11-21 Thread Boris Yen
is down). On Nov 20, 2011, at 6:01 AM, Boris Yen wrote: A quick question, what if DC2 is down, and after a while it comes back on. how does the data get sync to DC2 in this case? (assume hint is disable) Thanks in advance. On Thu, Nov 17, 2011 at 10:46 AM, Jeremiah Jordan jeremiah.jor

Re: Network traffic patterns

2011-11-21 Thread Boris Yen
I think ordered partitioner might cause most of the data to be saved only on a few nodes. This could contribute to what you saw. Try to use random partitioner if possible. On Mon, Nov 21, 2011 at 6:53 AM, Philippe watche...@gmail.com wrote: I'm using BOP. Le 20 nov. 2011 13:09, Boris Yen yulin

Re: Efficiency of Cross Data Center Replication...?

2011-11-20 Thread Boris Yen
A quick question, what if DC2 is down, and after a while it comes back on. how does the data get sync to DC2 in this case? (assume hint is disable) Thanks in advance. On Thu, Nov 17, 2011 at 10:46 AM, Jeremiah Jordan jeremiah.jor...@morningstar.com wrote: Pretty sure data is sent to the

Re: Network traffic patterns

2011-11-20 Thread Boris Yen
I am just curious about which partitioner you are using? On Thu, Nov 17, 2011 at 4:30 PM, Philippe watche...@gmail.com wrote: Hi Todd Yes all equal hardware. Nearly no CPU usage and no memory issues. Repairs are running in tens of minutes so i don't understand why replication would be backed

Re: Fast lookups for userId to username and vice versa

2011-11-16 Thread Boris Yen
I think secondary index could do the trick. However, if you need to provide the pagination function, I will go for Konstantin's solution. On Wed, Nov 16, 2011 at 10:27 PM, Konstantin Naryshkin konstant...@a-bb.net wrote: Or just have two column families to do it: A CF idToName that has the

Re: Second Cassandra users survey

2011-11-02 Thread Boris Yen
1. entity groups 2. cql support in cassandra-cli. 3. offset support in slice_range. 4. more sophisticated secondary index implementation. On Wed, Nov 2, 2011 at 8:38 PM, Patrick Julien pjul...@gmail.com wrote: - entity groups - co-processors - materialized views - CQL support directly in

multiple keyspace vs single keyspace

2011-10-26 Thread Boris Yen
Hi, We plan to put data into different keyspaces, e.g a keyspace specific to save our own configurations, a keyspace to save data like events, devices and some other keyspaces to save other type of data. Is there any limitation on this kind of design. Any pros or cons? Regards Boris

is Cassandra-494 fixed.

2011-09-27 Thread Boris Yen
Hi, I was wondering if this ticket has been taken care of. It is marked as resolved, but I saw None for the fix version. Can anyone shed some lights on this? Regards Boris

Re: is Cassandra-494 fixed.

2011-09-27 Thread Boris Yen
Does later mean it is going to be supported soon, like at version 1.x? Regards Boris On Tue, Sep 27, 2011 at 8:57 PM, Jonathan Ellis jbel...@gmail.com wrote: The key here is Resolution: Later On Tue, Sep 27, 2011 at 3:48 AM, Boris Yen yulin...@gmail.com wrote: Hi, I was wondering

Re: Possibility of going OOM using get_count

2011-09-25 Thread Boris Yen
Developer @aaronmorton http://www.thelastpickle.com On 23/09/2011, at 6:01 PM, Boris Yen wrote: On Fri, Sep 23, 2011 at 12:28 PM, aaron morton aa...@thelastpickle.comwrote: Offsets have been discussed in previously. IIRC the main concerns were either: There is no way to reliably count

Re: Possibility of going OOM using get_count

2011-09-23 Thread Boris Yen
Cassandra Developer @aaronmorton http://www.thelastpickle.com On 22/09/2011, at 8:50 PM, Boris Yen wrote: I was wondering if it is possible to use similar way as CASSANDRA-2894https://issues.apache.org/jira/browse/CASSANDRA-2894 to have the slice_predict support the offset concept

Re: benefits of off-heap (serializing) row cache?

2011-09-22 Thread Boris Yen
I think the cassandra team did not re-implement their own GC. I guess what they meant is the less heap being used, the better GC performance. AFAIK, only data that is not been updated frequently can benefit from off-heap row cache, because when a row is modified, the row inside cache need be

Re: Possibility of going OOM using get_count

2011-09-22 Thread Boris Yen
I was wondering if it is possible to use similar way as CASSANDRA-2894https://issues.apache.org/jira/browse/CASSANDRA-2894 to have the slice_predict support the offset concept? With the offset, it would be much easier to implement the paging from the client side. Boris On Mon, Sep 19, 2011 at

Re: Scaling Out / Replication Factor too?

2011-08-29 Thread Boris Yen
I am not sure, but I think the problem might be order preserving partitioners you used. When using order preserving partitioners data might be skewed meaning most data only stay in a few servers, so that might create a few heavy load servers. On Mon, Aug 29, 2011 at 7:24 AM, Ryan Lowe

Re: nodetool repair does not return...

2011-08-25 Thread Boris Yen
AM, Boris Yen yulin...@gmail.com wrote: Would Cassandra-2433 cause this? On Wed, Aug 24, 2011 at 7:23 PM, Boris Yen yulin...@gmail.com wrote: Hi, In our testing environment, we got two nodes with RF=2 running 0.8.4. We tried to test the repair functions of cassandra, however, every once

nodetool repair does not return...

2011-08-24 Thread Boris Yen
Hi, In our testing environment, we got two nodes with RF=2 running 0.8.4. We tried to test the repair functions of cassandra, however, every once a while, the nodetool repair never returns. We have checked the system.log, nothing seems to be out of ordinary, no errors, no exceptions. The data is

Re: nodetool repair does not return...

2011-08-24 Thread Boris Yen
Would Cassandra-2433 cause this? On Wed, Aug 24, 2011 at 7:23 PM, Boris Yen yulin...@gmail.com wrote: Hi, In our testing environment, we got two nodes with RF=2 running 0.8.4. We tried to test the repair functions of cassandra, however, every once a while, the nodetool repair never returns

Re: HOW TO select a column or all columns that start with X

2011-08-17 Thread Boris Yen
, but don't know the birth year, how can I get all the column values of Bob? with the help of composite type. 2011/8/4 Boris Yen yulin...@gmail.com Assume you have a column family named testCF with comparator * CompositeType*(AsciiType, IntegerType(reversed=true), IntegerType); and a few columns

Re: node restart taking too long

2011-08-17 Thread Boris Yen
Because the file only preserve the key of records, not the whole record. Records for those saved key will be loaded into cassandra during the startup of cassandra. On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu springri...@gmail.com wrote: but the data size in the saved_cache are relatively small:

inconsistent counter value?

2011-08-13 Thread Boris Yen
I posted a comment for Cassandra-3006 after 0.8.4 is released, but it seems not be noticed there, so I re-post here, wondering if anyone could help. --- Follow the same steps posted on Cassandra-3006, after step 11, I check the counter on .152, the counter values

Re: Enormous counter problem?

2011-08-09 Thread Boris Yen
. On Tue, Aug 9, 2011 at 5:28 AM, Boris Yen yulin...@gmail.com wrote: Hi, I am not sure if this is a bug or we use the counter the wrong way, but I keep getting a enormous counter number in our deployment. After a few tries, I am finally able to reproduce it. The following

Re: Enormous counter problem?

2011-08-09 Thread Boris Yen
ticket opened, https://issues.apache.org/jira/browse/CASSANDRA-3006 On Tue, Aug 9, 2011 at 5:38 PM, Boris Yen yulin...@gmail.com wrote: Actually, I reproduced this on 0.8.3, so it seems to me that it is not fixed yet. Boris On Tue, Aug 9, 2011 at 5:32 PM, Sylvain Lebresne sylv

Enormous counter problem?

2011-08-08 Thread Boris Yen
Hi, I am not sure if this is a bug or we use the counter the wrong way, but I keep getting a enormous counter number in our deployment. After a few tries, I am finally able to reproduce it. The following are the settings of my development: - I

Re: batch mutates throughput

2011-08-07 Thread Boris Yen
Maybe you could try to adjust the setting cassandraThriftSocketTimeout of hector. https://github.com/rantav/hector/wiki/User-Guide On Mon, Aug 8, 2011 at 6:54 AM, Philippe watche...@gmail.com wrote: Quick followup. I have pushed the RPC timeout to 30s. Using Hector, I'm doing 1 thread doing

Re: HOW TO select a column or all columns that start with X

2011-08-04 Thread Boris Yen
wrote: Can you please gimme an example on this using hector client On Thu, Aug 4, 2011 at 7:18 AM, Boris Yen yulin...@gmail.com wrote: It seems to me that your column name consists of two components. If you have the luxury to upgrade your cassandra to 0.8.1+, I think you can think about

Re: Planet Cassandra (an aggregation site for Cassandra News)

2011-08-04 Thread Boris Yen
Looking forward to it. ^^ On Thu, Aug 4, 2011 at 1:56 PM, Eldad Yamin elda...@gmail.com wrote: Great! I hope it will be open soon! On Wed, Aug 3, 2011 at 10:33 PM, Ed Anuff e...@anuff.com wrote: Awesome, great news! On Wed, Aug 3, 2011 at 11:53 AM, Lynn Bender line...@gmail.com wrote:

Re: HOW TO select a column or all columns that start with X

2011-08-03 Thread Boris Yen
It seems to me that your column name consists of two components. If you have the luxury to upgrade your cassandra to 0.8.1+, I think you can think about using the composite type/column. Conceptually, it might suit your use case better. On Wed, Aug 3, 2011 at 5:28 AM, Eldad Yamin elda...@gmail.com

Re: Secondary index on composite columns?

2011-08-01 Thread Boris Yen
are still advised not to use super column family when possible? Regards Boris On Mon, Aug 1, 2011 at 10:25 AM, Jonathan Ellis jbel...@gmail.com wrote: Sure, but it's still only useful for equality predicates. On Sun, Jul 31, 2011 at 8:50 PM, Boris Yen yulin...@gmail.com wrote: Hi, I

Re: How tokens work?

2011-07-31 Thread Boris Yen
On Mon, Aug 1, 2011 at 8:24 AM, Rafael Almeida almeida...@yahoo.com wrote: On Saturday, July 30, 2011, Rafael Almeida almeida...@yahoo.com wrote: Hello, I have computers that are better than others in my cluster. In special, there's one which is much better and I'd like to give it more

What is the nodeId for?

2011-07-20 Thread Boris Yen
Hi, I think we might have screwed our data up. I saw multiple columns inside record: System.NodeIdInfo.CurrentLocal. It makes our cassandra dead forever. I was wondering if anyone could tell me what the NodeId is for? so that I might be able to duplicate this. Thanks in advance Boris

Re: 2800 file descriptors?

2011-07-20 Thread Boris Yen
For the too many open files issue, maybe you could try: ulimit -n 5000 path to cassandra executable. On Wed, Jul 20, 2011 at 6:47 PM, cbert...@libero.it cbert...@libero.itwrote: Hi all, I wonder if is normal that Cassandra (5 nodes, 0.75) has more than 2800 fd open and growing. I still

Re: What is the nodeId for?

2011-07-20 Thread Boris Yen
removing it from there manually. Sam -- Sam Overton Acunu | http://www.acunu.com | @acunu On 20 July 2011 12:25, Boris Yen yulin...@gmail.com wrote: Hi, I think we might have screwed our data up. I saw multiple columns inside record: System.NodeIdInfo.CurrentLocal. It makes our

Re: Repair taking a long, long time

2011-07-20 Thread Boris Yen
We also got the same problem when using 0.8.0. As far as I know, there are a few issues relative to 'repair' has been marked as resolved at 0.8.1. Hope this could really solve our problem. On Wed, Jul 20, 2011 at 8:47 PM, David Boxenhorn da...@citypath.com wrote: I have this problem too, and I

Re: What is the nodeId for?

2011-07-20 Thread Boris Yen
. -- Sylvain On Wed, Jul 20, 2011 at 3:47 PM, Boris Yen yulin...@gmail.com wrote: Hi Sam, Thanks for the explanation. The NodeIds do appear in the Local row of NodeIdInfo, and after manually deleting two (I got three before I deleted them) of them from CurrentLocal row, the cassandra

Re: Default behavior of generate index_name for columns...

2011-07-18 Thread Boris Yen
: Column Name: 09partition (09partition) Validation Class: org.apache.cassandra.db.marshal.UTF8Type Index Type: KEYS On Mon, Jul 18, 2011 at 8:20 AM, Boris Yen yulin...@gmail.com wrote: Will this have any side effect when doing a get_indexed_slices or when a user wants

Re: Default behavior of generate index_name for columns...

2011-07-17 Thread Boris Yen
for different columns families. It seems the validation rule for index_name on 0.8.1 has been skipped completely. Is this a bug? or is it intentional? Regards Boris On Sat, Jul 16, 2011 at 10:38 AM, Boris Yen yulin...@gmail.com wrote: Done. It is CASSANDRA-2903https://issues.apache.org/jira

Re: Default behavior of generate index_name for columns...

2011-07-17 Thread Boris Yen
allowed, retroactively. On Sun, Jul 17, 2011 at 11:52 PM, Boris Yen yulin...@gmail.com wrote: I have tested another case, not sure if this is a bug. I created a few column families on 0.8.0 each has user_name column, in addition, I also enabled secondary index on this column. Then, I upgraded

Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Hi, I have a few column families, each has a column called user_name. I tried to use secondary index on user_name column for each of the column family. However, when creating these column families, cassandra keeps reporting Duplicate index name... exception. I finally figured out that it seems

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Hi Jonathan, Do I need to open a ticket for this? Regards Boris On Sat, Jul 16, 2011 at 6:29 AM, Jonathan Ellis jbel...@gmail.com wrote: Sounds reasonable to me. On Fri, Jul 15, 2011 at 2:55 AM, Boris Yen yulin...@gmail.com wrote: Hi, I have a few column families, each has a column

Re: Default behavior of generate index_name for columns...

2011-07-15 Thread Boris Yen
Done. It is CASSANDRA-2903https://issues.apache.org/jira/browse/CASSANDRA-2903 . On Sat, Jul 16, 2011 at 9:44 AM, Jonathan Ellis jbel...@gmail.com wrote: Please. On Fri, Jul 15, 2011 at 7:29 PM, Boris Yen yulin...@gmail.com wrote: Hi Jonathan, Do I need to open a ticket

ttl on a record?

2011-07-14 Thread Boris Yen
Hi, For now, cassandra support setting ttl on columns, is there any way to do the same to a record/row? Regards Boris

Re: ttl on a record?

2011-07-14 Thread Boris Yen
at 7:08 PM, Boris Yen yulin...@gmail.com wrote: Hi, For now, cassandra support setting ttl on columns, is there any way to do the same to a record/row? Regards Boris -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of DataStax, the source for professional Cassandra

Re: ttl on a record?

2011-07-14 Thread Boris Yen
Thanks a lot. ^^ My project can make good use of this feature. On Fri, Jul 15, 2011 at 10:59 AM, Jonathan Ellis jbel...@gmail.com wrote: On Thu, Jul 14, 2011 at 7:50 PM, Boris Yen yulin...@gmail.com wrote: Hi Jonathan, In this case, will this record with no column be removed from cassandra

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
I guess it is because the timestamp does not guarantee data consistency, but hash does. Boris On Wed, Jul 13, 2011 at 4:27 PM, David Boxenhorn da...@citypath.com wrote: I just saw this http://wiki.apache.org/cassandra/DigestQueries and I was wondering why it returns a hash of the data.

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
have to pieces of data that are different but have the same timestamp, how can you resolve consistency? This is a pathological situation to begin with, why should you waste effort to (not) solve it? On Wed, Jul 13, 2011 at 12:05 PM, Boris Yen yulin...@gmail.com wrote: I guess it is because

Re: Why do Digest Queries return hash instead of timestamp?

2011-07-13 Thread Boris Yen
is correct, if they both have the same timestamp? On Wed, Jul 13, 2011 at 12:40 PM, Boris Yen yulin...@gmail.com wrote: I can only say, data does matter, that is why the developers use hash instead of timestamp. If hash value comes from other node is not a match, a read repair would perform. so

is there a need to backup commit log?

2011-07-09 Thread Boris Yen
Hi, Let's say if I want to migrate data from one cluster to another cluster, in addition to snapshots, is there a need to also backup the commit log? As far as I know, some of the data inside commit log might not have been flushed to sstable during snapshots, therefore, if I only backup the

Re: is there a need to backup commit log?

2011-07-09 Thread Boris Yen
sstables flushed during the transfer. On Sat, Jul 9, 2011 at 6:23 AM, Boris Yen yulin...@gmail.com wrote: Hi, Let's say if I want to migrate data from one cluster to another cluster, in addition to snapshots, is there a need to also backup the commit log? As far as I know, some of the data