Migrating from mySQL to Cassandra

2013-03-03 Thread John Grogan
Hi,

We have decided to explore moving our database from mySQL to Cassandra. I am 
now installing it on my machine (OSX system) and need to think about how to do 
a data export from mySQL. Using phpmyAdmin, I have a range of different options 
to export the database.

However, the crux is figuring out an easy way to import the data into 
Cassandra. Does anyone have any thoughts they can share? In addition, are there 
any GUI tools like phpmyAdmin for Cassandra?

Thanks,
John.

Re: Migrating from mySQL to Cassandra

2013-03-03 Thread Tyler Hobbs
On Sun, Mar 3, 2013 at 5:06 AM, John Grogan vangu...@dir-uk.org wrote:


 However, the crux is figuring out an easy way to import the data into
 Cassandra. Does anyone have any thoughts they can share?


If you don't have a very large dataset and you're not pressed for time,
just iterating over your existing data with a normal mysql client and
inserting into Cassandra with a normal Cassandra client is the easiest
option.  Using multiple threads or processes can make this pretty quick.

For larger datasets, loading through a map/reduce job is typical.


 In addition, are there any GUI tools like phpmyAdmin for Cassandra?


There's DataStax OpsCenter, which has a free Community Edition:
http://www.datastax.com/products/opscenter

(Disclosure: I work at DataStax.)

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Migrating from mySQL to Cassandra

2013-03-03 Thread Marco Matarazzo
 There's DataStax OpsCenter, which has a free Community Edition: 
 http://www.datastax.com/products/opscenter

Is OpsCenter working with cassandra 1.2 with vnodes already ?

--
Marco Matarazzo




Re: Migrating from mySQL to Cassandra

2013-03-03 Thread Tyler Hobbs
On Sun, Mar 3, 2013 at 11:38 AM, Marco Matarazzo 
marco.matara...@hexkeep.com wrote:

 Is OpsCenter working with cassandra 1.2 with vnodes already ?


Yes, it's compatible with vnode-enabled clusters, but doesn't support
vnode-specific things like running shuffle.  For now, it basically randomly
picks one token from each node and uses that, so some token-related things
are a bit off, but everything else should work normally.  We're working on
more complete support for vnodes.

-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Compaction statistics information

2013-03-03 Thread Tyler Hobbs
It's a description of how many of the compacted SSTables the rows were
spread across prior to compaction.  In your case, 15 rows were spread
across two of the four sstables, 68757 rows were spread across three of the
four sstables, and 6865 were spread across all four.


On Fri, Mar 1, 2013 at 11:07 AM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 I'm seeing compaction statistics which look like the following

  INFO 17:03:09,216 Compacted 4 sstables to
 [/var/lib/cassandra/data/studata/datapoints/studata-datapoints-ib-629,].
 420,807,293 bytes to 415,287,150 (~98% of original) in 341,690ms =
 1.159088MB/s.  233,761 total rows, 75,637 unique.  Row merge counts were
 {1:0, 2:15, 3:68757, 4:6865, }

 Does anybody know what Row merge counts were {1:0, 2:15, 3:68757, 4:6865,
 } means?

 --
 Thanks

  A Jabbar Azam




-- 
Tyler Hobbs
DataStax http://datastax.com/


Re: Compaction statistics information

2013-03-03 Thread Jabbar Azam
Thanks Tyler
On 3 Mar 2013 18:55, Tyler Hobbs ty...@datastax.com wrote:

 It's a description of how many of the compacted SSTables the rows were
 spread across prior to compaction.  In your case, 15 rows were spread
 across two of the four sstables, 68757 rows were spread across three of the
 four sstables, and 6865 were spread across all four.


 On Fri, Mar 1, 2013 at 11:07 AM, Jabbar Azam aja...@gmail.com wrote:

 Hello,

 I'm seeing compaction statistics which look like the following

  INFO 17:03:09,216 Compacted 4 sstables to
 [/var/lib/cassandra/data/studata/datapoints/studata-datapoints-ib-629,].
 420,807,293 bytes to 415,287,150 (~98% of original) in 341,690ms =
 1.159088MB/s.  233,761 total rows, 75,637 unique.  Row merge counts were
 {1:0, 2:15, 3:68757, 4:6865, }

 Does anybody know what Row merge counts were {1:0, 2:15, 3:68757,
 4:6865, } means?

 --
 Thanks

  A Jabbar Azam




 --
 Tyler Hobbs
 DataStax http://datastax.com/



Re: no other nodes seen on priam cluster

2013-03-03 Thread Ben Bromhead
Glad you got it going!

There is a REST call you can make to priam telling it to double the cluster 
size (/v1/cassconfig/double_ring), it will pre fill all SimpleDB entries for 
when the nodes come online, you then change the number of nodes on the 
autoscale group. Now that Priam supports C* 1.2 with Vnodes, increasing the 
cluster size in an ad-hoc manner might be just around the corner.

Instacluster has some predefined cluster sizes (Free, Basic, Professional and 
Enterprise), these are loosely based on the estimated performance and storage 
capacity. 

You can also create a custom cluster where you define the number of nodes 
(minimum of 4) and the Instance type according to your requirements. For 
pricing on those check out https://www.instaclustr.com/pricing/per-instance, we 
base our pricing on estimated support and throughput requirements.

Cheers

Ben
Instaclustr | www.instaclustr.com | @instaclustr



On 02/03/2013, at 3:59 AM, Marcelo Elias Del Valle mvall...@gmail.com wrote:

 Thanks a lot Ben, actually I managed to make it work erasing the SimpleDB 
 Priam uses to keeps instances... I would pulled the last commit from the 
 repo, not sure if it helped or not.
 
 But you message made me curious about something...  How do you do to add more 
 Cassandra nodes on the fly? Just update the autoscale properties? I saw 
 instaclustr.com changes the instance type as the number of nodes increase 
 (not sure why the price also becomes higher per instance in this case), I am 
 guessing priam use the data backed up to S3 to restore a node data in another 
 instance, right?
 
 []s
 
 
 
 2013/2/28 Ben Bromhead b...@relational.io
 Off the top of my head I would check to make sure the Autoscaling Group you 
 created is restricted to a single Availability Zone, also Priam sets the 
 number of EC2 instances it expects based on the maximum instance count you 
 set on your scaling group (it did this last time i checked a few months ago, 
 it's behaviour may have changed). 
 
 So I would make your desired, min and max instances for your scaling group 
 are all the same, make sure your ASG is restricted to a single availability 
 zone (e.g. us-east-1b) and then (if you are able to and there is no data in 
 your cluster) delete all the SimpleDB entries Priam has created and then also 
 possibly clear out the cassandra data directory. 
 
 Other than that I see you've raised it as an issue on the Priam project page 
 , so see what they say ;)
 
 Cheers
 
 Ben
 
 On Thu, Feb 28, 2013 at 3:40 AM, Marcelo Elias Del Valle mvall...@gmail.com 
 wrote:
 One additional important info, I checked here and the seeds seems really 
 different on each node. The command
 echo `curl http://127.0.0.1:8080/Priam/REST/v1/cassconfig/get_seeds`
 returns ip2 on first node and ip1,ip1 on second node.
 Any idea why? It's probably what is causing cassandra to die, right?
 
 
 2013/2/27 Marcelo Elias Del Valle mvall...@gmail.com
 Hello Ben, Thanks for the willingness to help, 
 
 2013/2/27 Ben Bromhead b...@instaclustr.com
 Have your added the priam java agent to cassandras JVM argurments (e.g. 
 -javaagent:$CASS_HOME/lib/priam-cass-extensions-1.1.15.jar)  and does the web 
 container running priam have permissions to write to the cassandra config 
 directory? Also what do the priam logs say?
 
 I put the priam log of the first node bellow. Yes, I have added 
 priam-cass-extensions to java args and Priam IS actually writting to 
 cassandra dir.
  
 If you want to get up and running quickly with cassandra, AWS and priam 
 quickly check out www.instaclustr.com you. 
 We deploy Cassandra under your AWS account and you have full root access to 
 the nodes if you want to explore and play around + there is a free tier which 
 is great for experimenting and trying Cassandra out.
 
 That sounded really great. I am not sure if it would apply to our case (will 
 consider it though), but some partners would have a great benefit from it, 
 for sure! I will send your link to them.
 
 What priam says:
 
 2013-02-27 14:14:58.0614 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/public-hostname returns: 
 ec2-174-129-59-107.compute-1.amazon
 aws.com
 2013-02-27 14:14:58.0615 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/public-ipv4 returns: 174.129.59.107
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/instance-id returns: i-88b32bfb
 2013-02-27 14:14:58.0618 INFO pool-2-thread-1 
 com.netflix.priam.utils.SystemUtils Calling URL API: 
 http://169.254.169.254/latest/meta-data/instance-type returns: c1.medium
 2013-02-27 14:14:59.0614 INFO pool-2-thread-1 
 com.netflix.priam.defaultimpl.PriamConfiguration REGION set to us-east-1, ASG 
 Name set to dmp_cluster-useast1b
 2013-02-27 14:14:59.0746 INFO pool-2-thread-1 
 

Re: no backwards compatibility for thrift in 1.2.2? (we get utter failure)

2013-03-03 Thread Michael Kjellman
Dean,

I think if you look back through previous mailing list items you'll find
answers to this already but to summarize:

Tables created prior to 1.2 will continue to work after upgrade. New
tables created are not exposed by the Thrift API. It is up to client
developers to upgrade the client to pull the required metadata for
serialization and deserialization of the data from the System column
family instead.

I don't know Netflix's time table for an update to Astyanax but I'm sure
they are working on it. Alternatively, you can also  use the Datastax java
driver in your QA environment for now.

If you only need to access existing column families this shouldn't be an
issue

On 3/3/13 6:31 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

I remember huge discussions on backwards compatibility and we have a ton
of code using thrift(as do many people out there).  We happen to have a
startup bean for development that populates data in cassandra for us.  We
cleared out our QA completely(no data) and ran thisŠ.it turns out there
seems to be no backwards compatibility as it utterly fails.

From astyanax point of view, we simply get this (when going back to
1.1.4, everything works fine.  I can go down the path of finding out
where backwards compatibility breaks but does this mean essentially
everyone has to rewrite their applications?  OR is there a list of
breaking changes that we can't do anymore?  Has anyone tried the latest
astyanax client with 1.2.2 version?

An unexpected error occured caused by exception RuntimeException:
com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException:
NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
attempts=0]No hosts to borrow from

Thanks,
Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing
things. Start today: www.copy.com.


Re: no backwards compatibility for thrift in 1.2.2? (we get utter failure)

2013-03-03 Thread Hiller, Dean
It was an issue for existing tables as in QA, I ran an upgrade from 1.1.4
with simple data and then after 1.2.2, could not access stuff, ended up
with timeouts.  After that I cleared everything and just started a 1.2.2
as I wanted to see if just a base install of 1.2.2 with no upgrade worked
which it did not.

Ie. The backwards compatibility does not seem to be working.

Any ideas on how I could even attempt to resolve this?  We really want to
get to LCS and rumor is it is much slower in 1.1.x than 1.2.x on some post
I read 
http://mail-archives.apache.org/mod_mbox/cassandra-user/201302.mbox/%3C87D9
68e5-56a0-4da7-8676-ba90ff376...@yahoo.com%3E

Thanks,
Dean


On 3/3/13 7:39 PM, Michael Kjellman mkjell...@barracuda.com wrote:

Dean,

I think if you look back through previous mailing list items you'll find
answers to this already but to summarize:

Tables created prior to 1.2 will continue to work after upgrade. New
tables created are not exposed by the Thrift API. It is up to client
developers to upgrade the client to pull the required metadata for
serialization and deserialization of the data from the System column
family instead.

I don't know Netflix's time table for an update to Astyanax but I'm sure
they are working on it. Alternatively, you can also  use the Datastax java
driver in your QA environment for now.

If you only need to access existing column families this shouldn't be an
issue

On 3/3/13 6:31 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

I remember huge discussions on backwards compatibility and we have a ton
of code using thrift(as do many people out there).  We happen to have a
startup bean for development that populates data in cassandra for us.  We
cleared out our QA completely(no data) and ran thisŠ.it turns out there
seems to be no backwards compatibility as it utterly fails.

From astyanax point of view, we simply get this (when going back to
1.1.4, everything works fine.  I can go down the path of finding out
where backwards compatibility breaks but does this mean essentially
everyone has to rewrite their applications?  OR is there a list of
breaking changes that we can't do anymore?  Has anyone tried the latest
astyanax client with 1.2.2 version?

An unexpected error occured caused by exception RuntimeException:
com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException:
NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
attempts=0]No hosts to borrow from

Thanks,
Dean


Copy, by Barracuda, helps you store, protect, and share all your amazing

things. Start today: www.copy.com.



Re: no backwards compatibility for thrift in 1.2.2? (we get utter failure)

2013-03-03 Thread Edward Capriolo
Your other option is to create tables 'WITH COMPACT STORAGE'. Basically if
you use COMPACT STORAGE and create tables as you did before.

https://issues.apache.org/jira/browse/CASSANDRA-2995

From an application standpoint, if you can't do sparse, wide rows, you
break compatibility with 90% of Cassandra applications. So that rules out
almost everything; if you can't provide the same data model, you're
creating fragmentation, not pluggability.

I now call Cassandra compact storage 'c*' storage, and I call CQL3 storage
'c*++' storage. See debates on c vs C++ to understand why :).


On Sun, Mar 3, 2013 at 9:39 PM, Michael Kjellman mkjell...@barracuda.comwrote:

 Dean,

 I think if you look back through previous mailing list items you'll find
 answers to this already but to summarize:

 Tables created prior to 1.2 will continue to work after upgrade. New
 tables created are not exposed by the Thrift API. It is up to client
 developers to upgrade the client to pull the required metadata for
 serialization and deserialization of the data from the System column
 family instead.

 I don't know Netflix's time table for an update to Astyanax but I'm sure
 they are working on it. Alternatively, you can also  use the Datastax java
 driver in your QA environment for now.

 If you only need to access existing column families this shouldn't be an
 issue

 On 3/3/13 6:31 PM, Hiller, Dean dean.hil...@nrel.gov wrote:

 I remember huge discussions on backwards compatibility and we have a ton
 of code using thrift(as do many people out there).  We happen to have a
 startup bean for development that populates data in cassandra for us.  We
 cleared out our QA completely(no data) and ran thisŠ.it turns out there
 seems to be no backwards compatibility as it utterly fails.
 
 From astyanax point of view, we simply get this (when going back to
 1.1.4, everything works fine.  I can go down the path of finding out
 where backwards compatibility breaks but does this mean essentially
 everyone has to rewrite their applications?  OR is there a list of
 breaking changes that we can't do anymore?  Has anyone tried the latest
 astyanax client with 1.2.2 version?
 
 An unexpected error occured caused by exception RuntimeException:
 com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException:
 NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
 attempts=0]No hosts to borrow from
 
 Thanks,
 Dean


 Copy, by Barracuda, helps you store, protect, and share all your amazing

 things. Start today: www.copy.com.



Re: no backwards compatibility for thrift in 1.2.2? (we get utter failure)

2013-03-03 Thread aaron morton
Dean, 
Is this an issue with tables created using CQL 3 ?

OR…

An issue with tables created in 1.1.4 using the CLI not been readable after an 
in place upgrade to 1.2.2 ?

I did a quick test and it worked. 

Cheers
 
-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 3/03/2013, at 8:18 PM, Edward Capriolo edlinuxg...@gmail.com wrote:

 Your other option is to create tables 'WITH COMPACT STORAGE'. Basically if 
 you use COMPACT STORAGE and create tables as you did before.
 
 https://issues.apache.org/jira/browse/CASSANDRA-2995
 
 From an application standpoint, if you can't do sparse, wide rows, you break 
 compatibility with 90% of Cassandra applications. So that rules out almost 
 everything; if you can't provide the same data model, you're creating 
 fragmentation, not pluggability.
 
 I now call Cassandra compact storage 'c*' storage, and I call CQL3 storage 
 'c*++' storage. See debates on c vs C++ to understand why :).
 
 
 On Sun, Mar 3, 2013 at 9:39 PM, Michael Kjellman mkjell...@barracuda.com 
 wrote:
 Dean,
 
 I think if you look back through previous mailing list items you'll find
 answers to this already but to summarize:
 
 Tables created prior to 1.2 will continue to work after upgrade. New
 tables created are not exposed by the Thrift API. It is up to client
 developers to upgrade the client to pull the required metadata for
 serialization and deserialization of the data from the System column
 family instead.
 
 I don't know Netflix's time table for an update to Astyanax but I'm sure
 they are working on it. Alternatively, you can also  use the Datastax java
 driver in your QA environment for now.
 
 If you only need to access existing column families this shouldn't be an
 issue
 
 On 3/3/13 6:31 PM, Hiller, Dean dean.hil...@nrel.gov wrote:
 
 I remember huge discussions on backwards compatibility and we have a ton
 of code using thrift(as do many people out there).  We happen to have a
 startup bean for development that populates data in cassandra for us.  We
 cleared out our QA completely(no data) and ran thisŠ.it turns out there
 seems to be no backwards compatibility as it utterly fails.
 
 From astyanax point of view, we simply get this (when going back to
 1.1.4, everything works fine.  I can go down the path of finding out
 where backwards compatibility breaks but does this mean essentially
 everyone has to rewrite their applications?  OR is there a list of
 breaking changes that we can't do anymore?  Has anyone tried the latest
 astyanax client with 1.2.2 version?
 
 An unexpected error occured caused by exception RuntimeException:
 com.netflix.astyanax.connectionpool.exceptions.NoAvailableHostsException:
 NoAvailableHostsException: [host=None(0.0.0.0):0, latency=0(0),
 attempts=0]No hosts to borrow from
 
 Thanks,
 Dean
 
 
 Copy, by Barracuda, helps you store, protect, and share all your amazing
 
 things. Start today: www.copy.com.
 



Re: Select X amount of column families in a super column family in Cassandra using PHP?

2013-03-03 Thread aaron morton
You'll probably have better luck asking the author directly. 

Check the tutorial 
http://cassandra-php-client-library.com/tutorial/fetching-data and tell them 
what you have tried. 

For future reference we are trying to direct client specific queries to the 
client-dev list. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 2/03/2013, at 2:10 PM, Crocker Jordan jcrocker.115...@students.smu.ac.uk 
wrote:

 I'm using Kallaspriit's Cassandra/PHP library ( 
 https://github.com/kallaspriit/Cassandra-PHP-Client-Library).
 I'm trying to select the first x amount of column families within the super 
 column family, however, I'm having absolutely no luck, and google searches 
 don't seem to bring up much.
 
 I'm using Random Partitioning, and don't particularly wish to change to OPP 
 as I have read there is a lot more work involved.
 
 Any help would be much appreciated.
 



Re: Column Slice Query performance after deletions

2013-03-03 Thread aaron morton
 I need something to keep the deleted columns away from my query fetch. Not 
 only the tombstones.
 It looks like the min compaction might help on this. But I'm not sure yet on 
 what would be a reasonable value for its threeshold.
Your tombstones will not be purged in a compaction until after gc_grace and 
only if all fragments of the row are in the compaction. You right that you 
would probably want to run repair during the day if you are going to 
dramatically reduce gc_grace to avoid deleted data coming back to life. 

If you are using a single cassandra row as a queue, you are going to have 
trouble. Levelled compaction may help a little. 

If you are reading the most recent entries in the row, assuming the columns 
are sorted by some time stamp. Use the Reverse Comparator and issue slice 
commands to get the first X cols. That will remove tombstones from the problem. 
(Am guessing this is not something you do, just mentioning it). 

You next option is to change the data model so you don't use the same row all 
day. 

After that, consider a message queue. 

Cheers


-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 2/03/2013, at 12:03 PM, Víctor Hugo Oliveira Molinar vhmoli...@gmail.com 
wrote:

 Tombstones stay around until gc grace so you could lower that to see of that 
 fixes the performance issues.
 
 If the tombstones get collected,the column will live again, causing data 
 inconsistency since I cant run a repair during the regular operations. Not 
 sure if I got your thoughts on this.
 
 
 Size tiered or leveled comparison?
 
 I'm actuallly running on Size Tiered Compaction, but I've been looking into 
 changing it for Leveled. It seems to be the case.  Although even if I achieve 
 some performance, I would still have the same problem with the deleted 
 columns.
 
 
 I need something to keep the deleted columns away from my query fetch. Not 
 only the tombstones.
 It looks like the min compaction might help on this. But I'm not sure yet on 
 what would be a reasonable value for its threeshold.
 
 
 On Sat, Mar 2, 2013 at 4:22 PM, Michael Kjellman mkjell...@barracuda.com 
 wrote:
 Tombstones stay around until gc grace so you could lower that to see of that 
 fixes the performance issues.
 
 Size tiered or leveled comparison?
 
 On Mar 2, 2013, at 11:15 AM, Víctor Hugo Oliveira Molinar 
 vhmoli...@gmail.com wrote:
 
 What is your gc_grace set to? Sounds like as the number of tombstones 
 records increase your performance decreases. (Which I would expect)
 
 gr_grace is default.
 
 
 Casandra's data files are write once. Deletes are another write. Until 
 compaction they all live on disk.Making really big rows has these problem.
 Oh, so it looks like I should lower the min_compaction_threshold for this 
 column family. Right?
 What does realy mean this threeshold value?
 
 
 Guys, thanks for the help so far.
 
 On Sat, Mar 2, 2013 at 3:42 PM, Michael Kjellman mkjell...@barracuda.com 
 wrote:
 What is your gc_grace set to? Sounds like as the number of tombstones 
 records increase your performance decreases. (Which I would expect)
 
 On Mar 2, 2013, at 10:28 AM, Víctor Hugo Oliveira Molinar 
 vhmoli...@gmail.com wrote:
 
 I have a daily maintenance of my cluster where I truncate this column 
 family. Because its data doesnt need to be kept more than a day. 
 Since all the regular operations on it finishes around 4 hours before 
 finishing the day. I regurlarly run a truncate on it followed by a repair 
 at the end of the day.
 
 And every day, when the operations are started(when are only few deleted 
 columns), the performance looks pretty well.
 Unfortunately it is degraded along the day.
 
 
 On Sat, Mar 2, 2013 at 2:54 PM, Michael Kjellman mkjell...@barracuda.com 
 wrote:
 When is the last time you did a cleanup on the cf?
 
 On Mar 2, 2013, at 9:48 AM, Víctor Hugo Oliveira Molinar 
 vhmoli...@gmail.com wrote:
 
  Hello guys.
  I'm investigating the reasons of performance degradation for my case 
  scenario which follows:
 
  - I do have a column family which is filled of thousands of columns 
  inside a unique row(varies between 10k ~ 200k). And I do have also 
  thousands of rows, not much more than 15k.
  - This rows are constantly updated. But the write-load is not that 
  intensive. I estimate it as 100w/sec in the column family.
  - Each column represents a message which is read and processed by another 
  process. After reading it, the column is marked for deletion in order to 
  keep it out from the next query on this row.
 
  Ok, so, I've been figured out that after many insertions plus deletion 
  updates, my queries( column slice query ) are taking more time to be 
  performed. Even if there are only few columns, lower than 100.
 
  So it looks like that the longer is the number of columns being deleted, 
  the longer is the time spent for a query.
  - Internally at C*, does column slice query ranges among deleted 

Re: reading the updated values

2013-03-03 Thread aaron morton
 my question is how do i get the updated data in cassandra for last 1 hour or 
 so to be indexed in elasticsearch.
You cannot. 

The best approach is to update elastic search at the same time you update 
cassandra. 

Cheers

-
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 1/03/2013, at 11:57 PM, subhankar biswas neo20iit...@gmail.com wrote:

 hi,
i m trying to use cassandra as main data-store and elasticsearch for 
 realtime quries. my question is how do i get the updated data in cassandra 
 for last 1 hour or so to be indexed in elasticsearch.
once i get the updated data from cassandra i can index that to ES. 
is there any specific data model i have to follow to get the recent 
 updates of any CF.
 
thanks subhankar



old data / tombstones are not deleted after ttl

2013-03-03 Thread Matthias Zeilinger
Hi,

I´m running Cassandra 1.1.5 and have following issue.

I´m using a 10 days TTL on my CF. I can see a lot of tombstones in there, but 
they aren´t deleted after compaction.

I have tried a nodetool -cleanup and also a restart of Cassandra, but nothing 
happened.

total 61G
drwxr-xr-x  2 cassandra dba  20K Mar  4 06:35 .
drwxr-xr-x 10 cassandra dba 4.0K Dec 10 13:05 ..
-rw-r--r--  1 cassandra dba  15M Dec 15 22:04 
whatever-he-1398-CompressionInfo.db
-rw-r--r--  1 cassandra dba  19G Dec 15 22:04 whatever-he-1398-Data.db
-rw-r--r--  1 cassandra dba  15M Dec 15 22:04 whatever-he-1398-Filter.db
-rw-r--r--  1 cassandra dba 357M Dec 15 22:04 whatever-he-1398-Index.db
-rw-r--r--  1 cassandra dba 4.3K Dec 15 22:04 whatever-he-1398-Statistics.db
-rw-r--r--  1 cassandra dba 9.5M Feb  6 15:45 
whatever-he-5464-CompressionInfo.db
-rw-r--r--  1 cassandra dba  12G Feb  6 15:45 whatever-he-5464-Data.db
-rw-r--r--  1 cassandra dba  48M Feb  6 15:45 whatever-he-5464-Filter.db
-rw-r--r--  1 cassandra dba 736M Feb  6 15:45 whatever-he-5464-Index.db
-rw-r--r--  1 cassandra dba 4.3K Feb  6 15:45 whatever-he-5464-Statistics.db
-rw-r--r--  1 cassandra dba 9.7M Feb 21 19:13 
whatever-he-6829-CompressionInfo.db
-rw-r--r--  1 cassandra dba  12G Feb 21 19:13 whatever-he-6829-Data.db
-rw-r--r--  1 cassandra dba  47M Feb 21 19:13 whatever-he-6829-Filter.db
-rw-r--r--  1 cassandra dba 792M Feb 21 19:13 whatever-he-6829-Index.db
-rw-r--r--  1 cassandra dba 4.3K Feb 21 19:13 whatever-he-6829-Statistics.db
-rw-r--r--  1 cassandra dba 3.7M Mar  1 10:46 
whatever-he-7578-CompressionInfo.db
-rw-r--r--  1 cassandra dba 4.3G Mar  1 10:46 whatever-he-7578-Data.db
-rw-r--r--  1 cassandra dba  12M Mar  1 10:46 whatever-he-7578-Filter.db
-rw-r--r--  1 cassandra dba 274M Mar  1 10:46 whatever-he-7578-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  1 10:46 whatever-he-7578-Statistics.db
-rw-r--r--  1 cassandra dba 3.6M Mar  1 11:21 
whatever-he-7582-CompressionInfo.db
-rw-r--r--  1 cassandra dba 4.3G Mar  1 11:21 whatever-he-7582-Data.db
-rw-r--r--  1 cassandra dba 9.7M Mar  1 11:21 whatever-he-7582-Filter.db
-rw-r--r--  1 cassandra dba 236M Mar  1 11:21 whatever-he-7582-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  1 11:21 whatever-he-7582-Statistics.db
-rw-r--r--  1 cassandra dba 3.7M Mar  3 12:13 
whatever-he-7869-CompressionInfo.db
-rw-r--r--  1 cassandra dba 4.3G Mar  3 12:13 whatever-he-7869-Data.db
-rw-r--r--  1 cassandra dba 9.8M Mar  3 12:13 whatever-he-7869-Filter.db
-rw-r--r--  1 cassandra dba 239M Mar  3 12:13 whatever-he-7869-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  3 12:13 whatever-he-7869-Statistics.db
-rw-r--r--  1 cassandra dba 924K Mar  3 18:02 
whatever-he-7953-CompressionInfo.db
-rw-r--r--  1 cassandra dba 1.1G Mar  3 18:02 whatever-he-7953-Data.db
-rw-r--r--  1 cassandra dba 2.1M Mar  3 18:02 whatever-he-7953-Filter.db
-rw-r--r--  1 cassandra dba  51M Mar  3 18:02 whatever-he-7953-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  3 18:02 whatever-he-7953-Statistics.db
-rw-r--r--  1 cassandra dba 231K Mar  3 20:06 
whatever-he-7974-CompressionInfo.db
-rw-r--r--  1 cassandra dba 268M Mar  3 20:06 whatever-he-7974-Data.db
-rw-r--r--  1 cassandra dba 483K Mar  3 20:06 whatever-he-7974-Filter.db
-rw-r--r--  1 cassandra dba  12M Mar  3 20:06 whatever-he-7974-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  3 20:06 whatever-he-7974-Statistics.db
-rw-r--r--  1 cassandra dba 116K Mar  4 06:28 
whatever-he-8002-CompressionInfo.db
-rw-r--r--  1 cassandra dba 146M Mar  4 06:28 whatever-he-8002-Data.db
-rw-r--r--  1 cassandra dba 646K Mar  4 06:28 whatever-he-8002-Filter.db
-rw-r--r--  1 cassandra dba  16M Mar  4 06:28 whatever-he-8002-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  4 06:28 whatever-he-8002-Statistics.db
-rw-r--r--  1 cassandra dba  58K Mar  4 06:28 
whatever-he-8003-CompressionInfo.db
-rw-r--r--  1 cassandra dba  67M Mar  4 06:28 whatever-he-8003-Data.db
-rw-r--r--  1 cassandra dba 105K Mar  4 06:28 whatever-he-8003-Filter.db
-rw-r--r--  1 cassandra dba 2.5M Mar  4 06:28 whatever-he-8003-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  4 06:28 whatever-he-8003-Statistics.db
-rw-r--r--  1 cassandra dba 230K Mar  4 06:30 
whatever-he-8004-CompressionInfo.db
-rw-r--r--  1 cassandra dba 261M Mar  4 06:30 whatever-he-8004-Data.db
-rw-r--r--  1 cassandra dba 480K Mar  4 06:30 whatever-he-8004-Filter.db
-rw-r--r--  1 cassandra dba  12M Mar  4 06:30 whatever-he-8004-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  4 06:30 whatever-he-8004-Statistics.db
-rw-r--r--  1 cassandra dba  15K Mar  4 06:30 
whatever-he-8005-CompressionInfo.db
-rw-r--r--  1 cassandra dba  16M Mar  4 06:30 whatever-he-8005-Data.db
-rw-r--r--  1 cassandra dba  39K Mar  4 06:30 whatever-he-8005-Filter.db
-rw-r--r--  1 cassandra dba 944K Mar  4 06:30 whatever-he-8005-Index.db
-rw-r--r--  1 cassandra dba 4.3K Mar  4 06:30 whatever-he-8005-Statistics.db
-rw-r--r--  1 cassandra dba 5.0K Mar  4 06:35 
whatever-he-8006-CompressionInfo.db
-rw-r--r--  1 cassandra dba