Re: cassandra gui
If you give Virgil a try, let me know how it goes. The REST layer is pretty solid, but the gui is just a PoC which makes it easy to see what's in the CFs during development/testing. (It's only a couple hundred lines of ExtJS code built on the REST layer) We had plans to add CQL to the gui for CRUD, but never got around to it. -brian On Fri, Mar 30, 2012 at 5:20 PM, Ben McCann b...@benmccann.com wrote: If you want a REST interface and a GUI then Virgil may be interesting. I just came across it and haven't tried it myself yet. http://brianoneill.blogspot.com/2011/10/virgil-gui-and-rest-layer-for-cassandra.html On Fri, Mar 30, 2012 at 2:15 PM, John Liberty libjac...@gmail.com wrote: I made some updates to a cassandra-gui project I found, which seemed to be stuck at version 0.7, and posted to github: https://github.com/libjack/cassandra-gui Besides updating to work with version 1.0+, main improvements I added were to obey validation types, including column metadata, when displaying or accepting data. This includes support for Composite types, both keys and columns. I often create CF with non string keys, columns, values, and especially Composite types... And I need a tool to browse/verify and then add/edit test data, and this works quite well for me. -- John Liberty libjac...@gmail.com (585) 466-4249 -- Brian ONeill Lead Architect, Health Market Science (http://healthmarketscience.com) mobile:215.588.6024 blog: http://weblogs.java.net/blog/boneill42/ blog: http://brianoneill.blogspot.com/
Is the wiki outdated regarding Hive support?
The wiki http://wiki.apache.org/cassandra/HadoopSupport says Hive support is currently a standalone project but will become part of the main Cassandra source tree in the future. See https://github.com/riptano/hivefor details. This seems outdated to me since Datastax isn't planning any future updates to Brisk. The closest thing I've seen for Hive support is this Hive bug https://issues.apache.org/jira/browse/HIVE-1434. Should I update the wiki to delete this statement or is it still accurate? Thanks, Ben
Re: Is the wiki outdated regarding Hive support?
Hi Ben. That is still the repo. The code that ships with latest DSE is the hive-0.8.1-merge branch. We will try to get this into the Cassandra trunk asap. Jake On Apr 1, 2012, at 6:39 PM, Ben McCann b...@benmccann.com wrote: The wiki says Hive support is currently a standalone project but will become part of the main Cassandra source tree in the future. See https://github.com/riptano/hive for details. This seems outdated to me since Datastax isn't planning any future updates to Brisk. The closest thing I've seen for Hive support is this Hive bug. Should I update the wiki to delete this statement or is it still accurate? Thanks, Ben
Re: Is the wiki outdated regarding Hive support?
Oh, that's fantastic! Thanks so much for the quick response! On Sun, Apr 1, 2012 at 4:21 PM, Jake Luciani jak...@gmail.com wrote: Hi Ben. That is still the repo. The code that ships with latest DSE is the hive-0.8.1-merge branch. We will try to get this into the Cassandra trunk asap. Jake On Apr 1, 2012, at 6:39 PM, Ben McCann b...@benmccann.com wrote: The wiki http://wiki.apache.org/cassandra/HadoopSupport says Hive support is currently a standalone project but will become part of the main Cassandra source tree in the future. See https://github.com/riptano/hivefor details. This seems outdated to me since Datastax isn't planning any future updates to Brisk. The closest thing I've seen for Hive support is this Hive bug https://issues.apache.org/jira/browse/HIVE-1434. Should I update the wiki to delete this statement or is it still accurate? Thanks, Ben
Re: import
Since Python has a native csv module, it's trivial to achieve. I load lots of csv data into my database daily. Maxim On 3/27/2012 11:44 AM, R. Verlangen wrote: You can write your own script to parse the excel file (export as csv) and import it with batch inserts. Should be pretty easy if you have experience with those techniques. 2012/3/27 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com I want to import files from excel to cassandra? Is it possible?? Any tool that can help?? Whats the best way?? Plz reply :) -- With kind regards, Robin Verlangen www.robinverlangen.nl http://www.robinverlangen.nl
Re: data size difference between supercolumn and regular column
Thanks Aaron. Well I guess it is possible the data files from sueprcolumns could've been reduced in size after compaction. This bring yet another question. Say I am on a shoestring budget and can only put together a cluster with very limited storage space. The first iteration of pushing data into cassandra would drive the disk usage up into the 80% range. As time goes by, there will be updates to the data, and many columns will be overwritten. If I just push the updates in, the disks will run out of space on all of the cluster nodes. What would be the best way to handle such a situation if I cannot to buy larger disks? Do I need to delete the rows/columns that are going to be updated, do a compaction, and then insert the updates? Or is there a better way? Thanks -- Y. On Sat, Mar 31, 2012 at 3:28 AM, aaron morton aa...@thelastpickle.comwrote: does cassandra 1.0 perform some default compression? No. The on disk size depends to some degree on the work load. If there are a lot of overwrites or deleted you may have rows/columns that need to be compacted. You may have some big old SSTables that have not been compacted for a while. There is some overhead involved in the super columns: the super col name, length of the name and the number of columns. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 29/03/2012, at 9:47 AM, Yiming Sun wrote: Actually, after I read an article on cassandra 1.0 compression just now ( http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am more puzzled. In our schema, we didn't specify any compression options -- does cassandra 1.0 perform some default compression? or is the data reduction purely because of the schema change? Thanks. -- Y. On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun yiming@gmail.com wrote: Hi, We are trying to estimate the amount of storage we need for a production cassandra cluster. While I was doing the calculation, I noticed a very dramatic difference in terms of storage space used by cassandra data files. Our previous setup consists of a single-node cassandra 0.8.x with no replication, and the data is stored using supercolumns, and the data files total about 534GB on disk. A few weeks ago, I put together a cluster consisting of 3 nodes running cassandra 1.0 with replication factor of 2, and the data is flattened out and stored using regular columns. And the aggregated data file size is only 488GB (would be 244GB if no replication). This is a very dramatic reduction in terms of storage needs, and is certainly good news in terms of how much storage we need to provision. However, because of the dramatic reduction, I also would like to make sure it is absolutely correct before submitting it - and also get a sense of why there was such a difference. -- I know cassandra 1.0 does data compression, but does the schema change from supercolumn to regular column also help reduce storage usage? Thanks. -- Y.
Re: data size difference between supercolumn and regular column
Is that 80% with compression? If not, the first thing to do is turn on compression. Cassandra doesn't behave well when it runs out of disk space. You really want to try and stay around 50%, 60-70% works, but only if it is spread across multiple column families, and even then you can run into issues when doing repairs. -Jeremiah On Apr 1, 2012, at 9:44 PM, Yiming Sun wrote: Thanks Aaron. Well I guess it is possible the data files from sueprcolumns could've been reduced in size after compaction. This bring yet another question. Say I am on a shoestring budget and can only put together a cluster with very limited storage space. The first iteration of pushing data into cassandra would drive the disk usage up into the 80% range. As time goes by, there will be updates to the data, and many columns will be overwritten. If I just push the updates in, the disks will run out of space on all of the cluster nodes. What would be the best way to handle such a situation if I cannot to buy larger disks? Do I need to delete the rows/columns that are going to be updated, do a compaction, and then insert the updates? Or is there a better way? Thanks -- Y. On Sat, Mar 31, 2012 at 3:28 AM, aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote: does cassandra 1.0 perform some default compression? No. The on disk size depends to some degree on the work load. If there are a lot of overwrites or deleted you may have rows/columns that need to be compacted. You may have some big old SSTables that have not been compacted for a while. There is some overhead involved in the super columns: the super col name, length of the name and the number of columns. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 29/03/2012, at 9:47 AM, Yiming Sun wrote: Actually, after I read an article on cassandra 1.0 compression just now ( http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am more puzzled. In our schema, we didn't specify any compression options -- does cassandra 1.0 perform some default compression? or is the data reduction purely because of the schema change? Thanks. -- Y. On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun yiming@gmail.commailto:yiming@gmail.com wrote: Hi, We are trying to estimate the amount of storage we need for a production cassandra cluster. While I was doing the calculation, I noticed a very dramatic difference in terms of storage space used by cassandra data files. Our previous setup consists of a single-node cassandra 0.8.x with no replication, and the data is stored using supercolumns, and the data files total about 534GB on disk. A few weeks ago, I put together a cluster consisting of 3 nodes running cassandra 1.0 with replication factor of 2, and the data is flattened out and stored using regular columns. And the aggregated data file size is only 488GB (would be 244GB if no replication). This is a very dramatic reduction in terms of storage needs, and is certainly good news in terms of how much storage we need to provision. However, because of the dramatic reduction, I also would like to make sure it is absolutely correct before submitting it - and also get a sense of why there was such a difference. -- I know cassandra 1.0 does data compression, but does the schema change from supercolumn to regular column also help reduce storage usage? Thanks. -- Y.