Re: cassandra gui

2012-04-01 Thread Brian O'Neill
If you give Virgil a try, let me know how it goes.
The REST layer is pretty solid, but the gui is just a PoC which makes it
easy to see what's in the CFs during development/testing.
(It's only a couple hundred lines of ExtJS code built on the REST layer)

We had plans to add CQL to the gui for CRUD, but never got around to it.

-brian

On Fri, Mar 30, 2012 at 5:20 PM, Ben McCann b...@benmccann.com wrote:

 If you want a REST interface and a GUI then Virgil may be interesting.  I
 just came across it and haven't tried it myself yet.

 http://brianoneill.blogspot.com/2011/10/virgil-gui-and-rest-layer-for-cassandra.html




 On Fri, Mar 30, 2012 at 2:15 PM, John Liberty libjac...@gmail.com wrote:

 I made some updates to a cassandra-gui project I found, which seemed to
 be stuck at version 0.7, and posted to github:
 https://github.com/libjack/cassandra-gui

 Besides updating to work with version 1.0+, main improvements I added
 were to obey validation types, including column metadata, when displaying
 or accepting data. This includes support for Composite types, both keys and
 columns.

 I often create CF with non string keys, columns, values, and especially
 Composite types... And I need a tool to browse/verify and then add/edit
 test data, and this works quite well for me.

 --
 John Liberty
 libjac...@gmail.com
 (585) 466-4249





-- 
Brian ONeill
Lead Architect, Health Market Science (http://healthmarketscience.com)
mobile:215.588.6024
blog: http://weblogs.java.net/blog/boneill42/
blog: http://brianoneill.blogspot.com/


Is the wiki outdated regarding Hive support?

2012-04-01 Thread Ben McCann
The wiki http://wiki.apache.org/cassandra/HadoopSupport says Hive
support is currently a standalone project but will become part of the main
Cassandra source tree in the future. See
https://github.com/riptano/hivefor details.  This seems outdated to
me since Datastax isn't planning any
future updates to Brisk.  The closest thing I've seen for Hive support is this
Hive bug https://issues.apache.org/jira/browse/HIVE-1434.  Should I
update the wiki to delete this statement or is it still accurate?

Thanks,
Ben


Re: Is the wiki outdated regarding Hive support?

2012-04-01 Thread Jake Luciani
Hi Ben. That is still the repo. The code that ships with latest DSE is the 
hive-0.8.1-merge branch. 

We will try to get this into the Cassandra trunk asap. 

Jake

 

On Apr 1, 2012, at 6:39 PM, Ben McCann b...@benmccann.com wrote:

 The wiki says Hive support is currently a standalone project but will become 
 part of the main Cassandra source tree in the future. See 
 https://github.com/riptano/hive for details.  This seems outdated to me 
 since Datastax isn't planning any future updates to Brisk.  The closest thing 
 I've seen for Hive support is this Hive bug.  Should I update the wiki to 
 delete this statement or is it still accurate?
 
 Thanks,
 Ben
 
 


Re: Is the wiki outdated regarding Hive support?

2012-04-01 Thread Ben McCann
Oh, that's fantastic!  Thanks so much for the quick response!


On Sun, Apr 1, 2012 at 4:21 PM, Jake Luciani jak...@gmail.com wrote:

 Hi Ben. That is still the repo. The code that ships with latest DSE is the
 hive-0.8.1-merge branch.

 We will try to get this into the Cassandra trunk asap.

 Jake



 On Apr 1, 2012, at 6:39 PM, Ben McCann b...@benmccann.com wrote:

 The wiki http://wiki.apache.org/cassandra/HadoopSupport says Hive
 support is currently a standalone project but will become part of the main
 Cassandra source tree in the future. See https://github.com/riptano/hivefor 
 details.  This seems outdated to me since Datastax isn't planning any
 future updates to Brisk.  The closest thing I've seen for Hive support is this
 Hive bug https://issues.apache.org/jira/browse/HIVE-1434.  Should I
 update the wiki to delete this statement or is it still accurate?

 Thanks,
 Ben





Re: import

2012-04-01 Thread Maxim Potekhin

Since Python has a native csv module, it's trivial to achieve.
I load lots of csv data into my database daily.

Maxim

On 3/27/2012 11:44 AM, R. Verlangen wrote:
You can write your own script to parse the excel file (export as csv) 
and import it with batch inserts.


Should be pretty easy if you have experience with those techniques.

2012/3/27 puneet loya puneetl...@gmail.com mailto:puneetl...@gmail.com

I want to import files from excel to cassandra? Is it possible??

Any tool that can help??

Whats the best way??

Plz reply :)




--
With kind regards,

Robin Verlangen
www.robinverlangen.nl http://www.robinverlangen.nl





Re: data size difference between supercolumn and regular column

2012-04-01 Thread Yiming Sun
Thanks Aaron.  Well I guess it is possible the data files from sueprcolumns
could've been reduced in size after compaction.

This bring yet another question.  Say I am on a shoestring budget and can
only put together a cluster with very limited storage space.  The first
iteration of pushing data into cassandra would drive the disk usage up into
the 80% range.  As time goes by, there will be updates to the data, and
many columns will be overwritten.  If I just push the updates in, the disks
will run out of space on all of the cluster nodes.  What would be the best
way to handle such a situation if I cannot to buy larger disks? Do I need
to delete the rows/columns that are going to be updated, do a compaction,
and then insert the updates?  Or is there a better way?  Thanks

-- Y.

On Sat, Mar 31, 2012 at 3:28 AM, aaron morton aa...@thelastpickle.comwrote:

 does cassandra 1.0 perform some default compression?

 No.

 The on disk size depends to some degree on the work load.

 If there are a lot of overwrites or deleted you may have rows/columns that
 need to be compacted. You may have some big old SSTables that have not been
 compacted for a while.

 There is some overhead involved in the super columns: the super col name,
 length of the name and the number of columns.

 Cheers

   -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 29/03/2012, at 9:47 AM, Yiming Sun wrote:

 Actually, after I read an article on cassandra 1.0 compression just now (
 http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression),
 I am more puzzled.  In our schema, we didn't specify any compression
 options -- does cassandra 1.0 perform some default compression? or is the
 data reduction purely because of the schema change?  Thanks.

 -- Y.

 On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun yiming@gmail.com wrote:

 Hi,

 We are trying to estimate the amount of storage we need for a production
 cassandra cluster.  While I was doing the calculation, I noticed a very
 dramatic difference in terms of storage space used by cassandra data files.

 Our previous setup consists of a single-node cassandra 0.8.x with no
 replication, and the data is stored using supercolumns, and the data files
 total about 534GB on disk.

 A few weeks ago, I put together a cluster consisting of 3 nodes running
 cassandra 1.0 with replication factor of 2, and the data is flattened out
 and stored using regular columns.  And the aggregated data file size is
 only 488GB (would be 244GB if no replication).

 This is a very dramatic reduction in terms of storage needs, and is
 certainly good news in terms of how much storage we need to provision.
  However, because of the dramatic reduction, I also would like to make sure
 it is absolutely correct before submitting it - and also get a sense of why
 there was such a difference. -- I know cassandra 1.0 does data compression,
 but does the schema change from supercolumn to regular column also help
 reduce storage usage?  Thanks.

 -- Y.






Re: data size difference between supercolumn and regular column

2012-04-01 Thread Jeremiah Jordan
Is that 80% with compression?  If not, the first thing to do is turn on 
compression.  Cassandra doesn't behave well when it runs out of disk space.  
You really want to try and stay around 50%,  60-70% works, but only if it is 
spread across multiple column families, and even then you can run into issues 
when doing repairs.

-Jeremiah


On Apr 1, 2012, at 9:44 PM, Yiming Sun wrote:

Thanks Aaron.  Well I guess it is possible the data files from sueprcolumns 
could've been reduced in size after compaction.

This bring yet another question.  Say I am on a shoestring budget and can only 
put together a cluster with very limited storage space.  The first iteration of 
pushing data into cassandra would drive the disk usage up into the 80% range.  
As time goes by, there will be updates to the data, and many columns will be 
overwritten.  If I just push the updates in, the disks will run out of space on 
all of the cluster nodes.  What would be the best way to handle such a 
situation if I cannot to buy larger disks? Do I need to delete the rows/columns 
that are going to be updated, do a compaction, and then insert the updates?  Or 
is there a better way?  Thanks

-- Y.

On Sat, Mar 31, 2012 at 3:28 AM, aaron morton 
aa...@thelastpickle.commailto:aa...@thelastpickle.com wrote:
does cassandra 1.0 perform some default compression?
No.

The on disk size depends to some degree on the work load.

If there are a lot of overwrites or deleted you may have rows/columns that need 
to be compacted. You may have some big old SSTables that have not been 
compacted for a while.

There is some overhead involved in the super columns: the super col name, 
length of the name and the number of columns.

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.comhttp://www.thelastpickle.com/

On 29/03/2012, at 9:47 AM, Yiming Sun wrote:

Actually, after I read an article on cassandra 1.0 compression just now ( 
http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-compression), I am 
more puzzled.  In our schema, we didn't specify any compression options -- does 
cassandra 1.0 perform some default compression? or is the data reduction purely 
because of the schema change?  Thanks.

-- Y.

On Wed, Mar 28, 2012 at 4:40 PM, Yiming Sun 
yiming@gmail.commailto:yiming@gmail.com wrote:
Hi,

We are trying to estimate the amount of storage we need for a production 
cassandra cluster.  While I was doing the calculation, I noticed a very 
dramatic difference in terms of storage space used by cassandra data files.

Our previous setup consists of a single-node cassandra 0.8.x with no 
replication, and the data is stored using supercolumns, and the data files 
total about 534GB on disk.

A few weeks ago, I put together a cluster consisting of 3 nodes running 
cassandra 1.0 with replication factor of 2, and the data is flattened out and 
stored using regular columns.  And the aggregated data file size is only 488GB 
(would be 244GB if no replication).

This is a very dramatic reduction in terms of storage needs, and is certainly 
good news in terms of how much storage we need to provision.  However, because 
of the dramatic reduction, I also would like to make sure it is absolutely 
correct before submitting it - and also get a sense of why there was such a 
difference. -- I know cassandra 1.0 does data compression, but does the schema 
change from supercolumn to regular column also help reduce storage usage?  
Thanks.

-- Y.