Re: Running Cassandra as a Windows Service

2010-07-05 Thread Jonathan Ellis
If you'd like to get it included in the Cassandra tree, submitting it
to https://issues.apache.org/jira/browse/CASSANDRA-292 would be a good
start.

On Sun, Jul 4, 2010 at 2:54 PM, Kochheiser,Todd W - TOK-DITT-1
twkochhei...@bpa.gov wrote:
 Apache’s Procrun is a “real windows service” similar to the one from Tanuki
 Software.  In regards to batch files, they are not used by Procrun in any
 way.  The batch file I created simply makes installing and removing the
 service super easy. I’ve used the Tanuki Java Wrapper Service before and it
 also works very nicely.  However, since Cassandra is an Apache project, it
 seemed most appropriate to go ahead and use a Windows service runner from
 Apache and to avoid any possible licensing or bundling issues, especially
 since I’d like to see it included as a contrib.  And, Procrun has been used
 for years by various Apache project with Apache Tomcat probably being the
 most famous.  Regardless, the Tanuki runner works well.



 Todd



 PS: I believe the community version of the Java Serviced Wrapper from Tanuki
 is GPL v2, but I could be wrong.  Their licensing is a mix of





 

 From: Richard Grossman [mailto:richie...@gmail.com]
 Sent: Sunday, July 04, 2010 3:37 AM
 To: user@cassandra.apache.org
 Subject: Re: Running Cassandra as a Windows Service



 Hello

 Why not using Java Wrapper Service?
 http://wrapper.tanukisoftware.org/doc/english/download.jsp
 You can configure any java process as real windows services instead of batch
 files

 Richard

 On Thu, Jun 10, 2010 at 8:34 PM, Kochheiser,Todd W - TO-DITT1
 twkochhei...@bpa.gov wrote:

 For various reasons I am required to deploy systems on Windows.  As such, I
 went looking for information on running Cassandra as a Windows service.
 I’ve read some of the user threads regarding running Cassandra as a Windows
 service, such as this one:



     http://www.mail-archive.com/user@cassandra.apache.org/msg01656.html



 I also found the following JIRA issue:



     https://issues.apache.org/jira/browse/CASSANDRA-292



 As it didn’t look like anyone has contributed a formal solution and having
 some experience using Apache’s Procrun
 (http://commons.apache.org/daemon/procrun.html), I decided to go ahead and
 write a batch script and a simple “WindowsService” class to accomplish the
 task.  The WindowsService class only makes calls to public methods in
 CassandraDeamon and is fairly simple.  In combination with the batch script,
 it is very easy to install and remove the service.  At this point, I’ve
 installed Cassandra as a Windows service on XP (32 bit), Windows 7 (64 bit)
 and Windows Server 2008 R1/R2 (64 bit).  It should work fine on other
 version of Windows (2K, 2K3).



 Questions:



 1.   Has anyone else already done this work?

 2.   If not, I wouldn’t mind sharing the code/script or contributing it
 back to the project.  Is there any interest in this from the Cassandra dev
 team or the user community?



 Ideally the WindowsService could be included in the distributed
 source/binary distributions (perhaps in a contrib area) as well as the batch
 script and associated procrun executables.  Or, perhaps it could be posted
 to a Cassandra community site (is there one?).



 Todd















-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Digg 4 Preview on TWiT

2010-07-05 Thread Eric Evans
On Sun, 2010-07-04 at 13:14 +0100, Bill de hÓra wrote:
 This person's understanding is that Facebook 'no longer contributes to
 nor uses Cassandra.':
 
 http://redmonk.com/sogrady/2010/05/17/beyond-cassandra/

Last I heard, Facebook was still using Cassandra for what they had
always used it for, Inbox Search. Last I heard, there were no plans in
place to change that.

 I assume it's accurate - policy reasons wouldn't interest me as much
 as technical ones. 

My understanding is that their new initiatives use (or will use) HBase.
I was never able to get anyone to go into detail on why.

-- 
Eric Evans
eev...@rackspace.com



Re: Need a little help with data model design

2010-07-05 Thread Jonathan Ellis
You don't want to have all the data from a single logger in a single
row b/c of the 2GB size limit.

If you have a small, static number of loggers you could create one CF
per logger and use timestamp as the row key.  Otherwise use a
composite key (logger+timestamp) as the key in a single CF.

2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com:
 I'm new to cassandra, and I want use it to store:
 loggers = { // (super)ColumnFamily ?
     logger1 : { // row inside super CF ?
         timestamp1 : {
             value : 10
         },
         timestamp2 : {
             value : 12
         }
         (many many many more)
     }
     logger2 : { //logger of diffrent type (in this example it logs 3 values
 instead of 1)
         timestamp1 : {
             v : 300,
             c : 123,
             s : 12.13
         },
         timestamp2 : {
             v : 300
             c : 123
             s : 12.13
         }
         (many many many more)
     }
     (many many many more)
 }
 the only way i will be accesing this data is:
 - example: fetch slice of data from logger2 ( start = 1278009131 (timestmap)
 , end = 1278109131 )
      expecting sorted array of data.
 - example: fetch slice of data from (logger2 and logger10 and logger20 and
 logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
      expecting map of sorted arrays of data. [it is basically N queries of
 first type]
 is this right definition of above: ColumnFamily CompareWith=TimeUUIDType
 ColumnType=Super
     CompareSubcolumnsWith=BytesType Name=loggers/ ?
 what's the best way to model this data in cassadra (keeping in mind
 partitioning and other important stuff) ?






-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Need a little help with data model design

2010-07-05 Thread Bartosz Kołodziej
I have big and dynamic number of loggers.

According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB
size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) )
I think I can go with svn release at the moment.

Solving this by composite key (logger+timestamp) would require
OrderPreservingPartitioner to make efficient range queries, while in first
approach in can go with RandomPartitioner (data would be partitioned by
logger - simple and effective).

Btw which model provides faster queries ?
(i need only to get slice (timestamp1 to timestmap2) of data for logger X )

On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis jbel...@gmail.com wrote:

 You don't want to have all the data from a single logger in a single
 row b/c of the 2GB size limit.

 If you have a small, static number of loggers you could create one CF
 per logger and use timestamp as the row key.  Otherwise use a
 composite key (logger+timestamp) as the key in a single CF.

 2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com:
  I'm new to cassandra, and I want use it to store:
  loggers = { // (super)ColumnFamily ?
  logger1 : { // row inside super CF ?
  timestamp1 : {
  value : 10
  },
  timestamp2 : {
  value : 12
  }
  (many many many more)
  }
  logger2 : { //logger of diffrent type (in this example it logs 3
 values
  instead of 1)
  timestamp1 : {
  v : 300,
  c : 123,
  s : 12.13
  },
  timestamp2 : {
  v : 300
  c : 123
  s : 12.13
  }
  (many many many more)
  }
  (many many many more)
  }
  the only way i will be accesing this data is:
  - example: fetch slice of data from logger2 ( start = 1278009131
 (timestmap)
  , end = 1278109131 )
   expecting sorted array of data.
  - example: fetch slice of data from (logger2 and logger10 and logger20
 and
  logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
   expecting map of sorted arrays of data. [it is basically N queries
 of
  first type]
  is this right definition of above: ColumnFamily
 CompareWith=TimeUUIDType
  ColumnType=Super
  CompareSubcolumnsWith=BytesType Name=loggers/ ?
  what's the best way to model this data in cassadra (keeping in mind
  partitioning and other important stuff) ?
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com



Re: Storing application logs into Cassandra / design question

2010-07-05 Thread yaw
Perfectly right Nick.

So i suppose that If I want to keep RandomPartionner ( I understand this is
the best for high volume applications), I could design database like this :

A CF with key = UUID  will contain log message details = allows me to split
real data  evenly between nodes

A CF with key = 'Date'  and columns name are UUIDs (UUID sorted) = allow me
to get last X logs of the day and data are approximately well distributed...

Many thanks,
yaw


2010/7/3 Микола Стрєбков n...@mykola.org

  On 02.07.10 16:10, yaw wrote:
  Hi all,
  I'd like to store logs of my application into cassandra.
 
  I need to query logs by date (last X logs) or  user (give me last X logs
  for  user Y )  and I want to dispatch data among several servers.
 
 
  I think  the best design way  is  following :
 
  Each  log identifier is a time based UUID.
 
 
  A CF with key = UUID /  *Random Partitioner*  will contain log message
  = allows me to split real data  evenly between nodes
 
  A CF with key = UUID   and *order*-*preserving partitioner * allow me to
  get last X logs
 
  A CF with key = userID   and columns name are UUIDs (UUID sorted) =
  allow me to get last X logs  of user Y
 
  Am I right ?

 No: you can have only one partitioner per cluster. See:


 http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

 http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model

 --
 Mykola Stryebkov
 Blog: http://mykola.org/blog/
 Public key: http://mykola.org/pubkey.txt
 fpr: 0226 54EE C1FF 8636 36EF 2AC9 BCE9 CFC7 9CF4 6747



Re: Need a little help with data model design

2010-07-05 Thread Jonathan Ellis
i would expect row per log entry will be substantially faster to query.

2010/7/5 Bartosz Kołodziej bartosz.kolodz...@gmail.com:
 I have big and dynamic number of loggers.
 According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB
 size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) )
 I think I can go with svn release at the moment.
 Solving this by composite key (logger+timestamp) would require
 OrderPreservingPartitioner to make efficient range queries, while in first
 approach in can go with RandomPartitioner (data would be partitioned by
 logger - simple and effective).
 Btw which model provides faster queries ?
 (i need only to get slice (timestamp1 to timestmap2) of data for logger X )
 On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis jbel...@gmail.com wrote:

 You don't want to have all the data from a single logger in a single
 row b/c of the 2GB size limit.

 If you have a small, static number of loggers you could create one CF
 per logger and use timestamp as the row key.  Otherwise use a
 composite key (logger+timestamp) as the key in a single CF.

 2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com:
  I'm new to cassandra, and I want use it to store:
  loggers = { // (super)ColumnFamily ?
      logger1 : { // row inside super CF ?
          timestamp1 : {
              value : 10
          },
          timestamp2 : {
              value : 12
          }
          (many many many more)
      }
      logger2 : { //logger of diffrent type (in this example it logs 3
  values
  instead of 1)
          timestamp1 : {
              v : 300,
              c : 123,
              s : 12.13
          },
          timestamp2 : {
              v : 300
              c : 123
              s : 12.13
          }
          (many many many more)
      }
      (many many many more)
  }
  the only way i will be accesing this data is:
  - example: fetch slice of data from logger2 ( start = 1278009131
  (timestmap)
  , end = 1278109131 )
       expecting sorted array of data.
  - example: fetch slice of data from (logger2 and logger10 and logger20
  and
  logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 )
       expecting map of sorted arrays of data. [it is basically N queries
  of
  first type]
  is this right definition of above: ColumnFamily
  CompareWith=TimeUUIDType
  ColumnType=Super
      CompareSubcolumnsWith=BytesType Name=loggers/ ?
  what's the best way to model this data in cassadra (keeping in mind
  partitioning and other important stuff) ?
 
 
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com





-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com