Re: Running Cassandra as a Windows Service
If you'd like to get it included in the Cassandra tree, submitting it to https://issues.apache.org/jira/browse/CASSANDRA-292 would be a good start. On Sun, Jul 4, 2010 at 2:54 PM, Kochheiser,Todd W - TOK-DITT-1 twkochhei...@bpa.gov wrote: Apache’s Procrun is a “real windows service” similar to the one from Tanuki Software. In regards to batch files, they are not used by Procrun in any way. The batch file I created simply makes installing and removing the service super easy. I’ve used the Tanuki Java Wrapper Service before and it also works very nicely. However, since Cassandra is an Apache project, it seemed most appropriate to go ahead and use a Windows service runner from Apache and to avoid any possible licensing or bundling issues, especially since I’d like to see it included as a contrib. And, Procrun has been used for years by various Apache project with Apache Tomcat probably being the most famous. Regardless, the Tanuki runner works well. Todd PS: I believe the community version of the Java Serviced Wrapper from Tanuki is GPL v2, but I could be wrong. Their licensing is a mix of From: Richard Grossman [mailto:richie...@gmail.com] Sent: Sunday, July 04, 2010 3:37 AM To: user@cassandra.apache.org Subject: Re: Running Cassandra as a Windows Service Hello Why not using Java Wrapper Service? http://wrapper.tanukisoftware.org/doc/english/download.jsp You can configure any java process as real windows services instead of batch files Richard On Thu, Jun 10, 2010 at 8:34 PM, Kochheiser,Todd W - TO-DITT1 twkochhei...@bpa.gov wrote: For various reasons I am required to deploy systems on Windows. As such, I went looking for information on running Cassandra as a Windows service. I’ve read some of the user threads regarding running Cassandra as a Windows service, such as this one: http://www.mail-archive.com/user@cassandra.apache.org/msg01656.html I also found the following JIRA issue: https://issues.apache.org/jira/browse/CASSANDRA-292 As it didn’t look like anyone has contributed a formal solution and having some experience using Apache’s Procrun (http://commons.apache.org/daemon/procrun.html), I decided to go ahead and write a batch script and a simple “WindowsService” class to accomplish the task. The WindowsService class only makes calls to public methods in CassandraDeamon and is fairly simple. In combination with the batch script, it is very easy to install and remove the service. At this point, I’ve installed Cassandra as a Windows service on XP (32 bit), Windows 7 (64 bit) and Windows Server 2008 R1/R2 (64 bit). It should work fine on other version of Windows (2K, 2K3). Questions: 1. Has anyone else already done this work? 2. If not, I wouldn’t mind sharing the code/script or contributing it back to the project. Is there any interest in this from the Cassandra dev team or the user community? Ideally the WindowsService could be included in the distributed source/binary distributions (perhaps in a contrib area) as well as the batch script and associated procrun executables. Or, perhaps it could be posted to a Cassandra community site (is there one?). Todd -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Digg 4 Preview on TWiT
On Sun, 2010-07-04 at 13:14 +0100, Bill de hÓra wrote: This person's understanding is that Facebook 'no longer contributes to nor uses Cassandra.': http://redmonk.com/sogrady/2010/05/17/beyond-cassandra/ Last I heard, Facebook was still using Cassandra for what they had always used it for, Inbox Search. Last I heard, there were no plans in place to change that. I assume it's accurate - policy reasons wouldn't interest me as much as technical ones. My understanding is that their new initiatives use (or will use) HBase. I was never able to get anyone to go into detail on why. -- Eric Evans eev...@rackspace.com
Re: Need a little help with data model design
You don't want to have all the data from a single logger in a single row b/c of the 2GB size limit. If you have a small, static number of loggers you could create one CF per logger and use timestamp as the row key. Otherwise use a composite key (logger+timestamp) as the key in a single CF. 2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com: I'm new to cassandra, and I want use it to store: loggers = { // (super)ColumnFamily ? logger1 : { // row inside super CF ? timestamp1 : { value : 10 }, timestamp2 : { value : 12 } (many many many more) } logger2 : { //logger of diffrent type (in this example it logs 3 values instead of 1) timestamp1 : { v : 300, c : 123, s : 12.13 }, timestamp2 : { v : 300 c : 123 s : 12.13 } (many many many more) } (many many many more) } the only way i will be accesing this data is: - example: fetch slice of data from logger2 ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting sorted array of data. - example: fetch slice of data from (logger2 and logger10 and logger20 and logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting map of sorted arrays of data. [it is basically N queries of first type] is this right definition of above: ColumnFamily CompareWith=TimeUUIDType ColumnType=Super CompareSubcolumnsWith=BytesType Name=loggers/ ? what's the best way to model this data in cassadra (keeping in mind partitioning and other important stuff) ? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Need a little help with data model design
I have big and dynamic number of loggers. According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) ) I think I can go with svn release at the moment. Solving this by composite key (logger+timestamp) would require OrderPreservingPartitioner to make efficient range queries, while in first approach in can go with RandomPartitioner (data would be partitioned by logger - simple and effective). Btw which model provides faster queries ? (i need only to get slice (timestamp1 to timestmap2) of data for logger X ) On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis jbel...@gmail.com wrote: You don't want to have all the data from a single logger in a single row b/c of the 2GB size limit. If you have a small, static number of loggers you could create one CF per logger and use timestamp as the row key. Otherwise use a composite key (logger+timestamp) as the key in a single CF. 2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com: I'm new to cassandra, and I want use it to store: loggers = { // (super)ColumnFamily ? logger1 : { // row inside super CF ? timestamp1 : { value : 10 }, timestamp2 : { value : 12 } (many many many more) } logger2 : { //logger of diffrent type (in this example it logs 3 values instead of 1) timestamp1 : { v : 300, c : 123, s : 12.13 }, timestamp2 : { v : 300 c : 123 s : 12.13 } (many many many more) } (many many many more) } the only way i will be accesing this data is: - example: fetch slice of data from logger2 ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting sorted array of data. - example: fetch slice of data from (logger2 and logger10 and logger20 and logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting map of sorted arrays of data. [it is basically N queries of first type] is this right definition of above: ColumnFamily CompareWith=TimeUUIDType ColumnType=Super CompareSubcolumnsWith=BytesType Name=loggers/ ? what's the best way to model this data in cassadra (keeping in mind partitioning and other important stuff) ? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Storing application logs into Cassandra / design question
Perfectly right Nick. So i suppose that If I want to keep RandomPartionner ( I understand this is the best for high volume applications), I could design database like this : A CF with key = UUID will contain log message details = allows me to split real data evenly between nodes A CF with key = 'Date' and columns name are UUIDs (UUID sorted) = allow me to get last X logs of the day and data are approximately well distributed... Many thanks, yaw 2010/7/3 Микола Стрєбков n...@mykola.org On 02.07.10 16:10, yaw wrote: Hi all, I'd like to store logs of my application into cassandra. I need to query logs by date (last X logs) or user (give me last X logs for user Y ) and I want to dispatch data among several servers. I think the best design way is following : Each log identifier is a time based UUID. A CF with key = UUID / *Random Partitioner* will contain log message = allows me to split real data evenly between nodes A CF with key = UUID and *order*-*preserving partitioner * allow me to get last X logs A CF with key = userID and columns name are UUIDs (UUID sorted) = allow me to get last X logs of user Y Am I right ? No: you can have only one partitioner per cluster. See: http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/ http://arin.me/blog/wtf-is-a-supercolumn-cassandra-data-model -- Mykola Stryebkov Blog: http://mykola.org/blog/ Public key: http://mykola.org/pubkey.txt fpr: 0226 54EE C1FF 8636 36EF 2AC9 BCE9 CFC7 9CF4 6747
Re: Need a little help with data model design
i would expect row per log entry will be substantially faster to query. 2010/7/5 Bartosz Kołodziej bartosz.kolodz...@gmail.com: I have big and dynamic number of loggers. According to this https://issues.apache.org/jira/browse/CASSANDRA-16 2GB size limit is no longer an issue in 0.7 (btw mnesia has similar issue ;-) ) I think I can go with svn release at the moment. Solving this by composite key (logger+timestamp) would require OrderPreservingPartitioner to make efficient range queries, while in first approach in can go with RandomPartitioner (data would be partitioned by logger - simple and effective). Btw which model provides faster queries ? (i need only to get slice (timestamp1 to timestmap2) of data for logger X ) On Mon, Jul 5, 2010 at 6:23 PM, Jonathan Ellis jbel...@gmail.com wrote: You don't want to have all the data from a single logger in a single row b/c of the 2GB size limit. If you have a small, static number of loggers you could create one CF per logger and use timestamp as the row key. Otherwise use a composite key (logger+timestamp) as the key in a single CF. 2010/7/2 Bartosz Kołodziej bartosz.kolodz...@gmail.com: I'm new to cassandra, and I want use it to store: loggers = { // (super)ColumnFamily ? logger1 : { // row inside super CF ? timestamp1 : { value : 10 }, timestamp2 : { value : 12 } (many many many more) } logger2 : { //logger of diffrent type (in this example it logs 3 values instead of 1) timestamp1 : { v : 300, c : 123, s : 12.13 }, timestamp2 : { v : 300 c : 123 s : 12.13 } (many many many more) } (many many many more) } the only way i will be accesing this data is: - example: fetch slice of data from logger2 ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting sorted array of data. - example: fetch slice of data from (logger2 and logger10 and logger20 and logger1234) ( start = 1278009131 (timestmap) , end = 1278109131 ) expecting map of sorted arrays of data. [it is basically N queries of first type] is this right definition of above: ColumnFamily CompareWith=TimeUUIDType ColumnType=Super CompareSubcolumnsWith=BytesType Name=loggers/ ? what's the best way to model this data in cassadra (keeping in mind partitioning and other important stuff) ? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com