Write assurance in Cassandra

2010-07-04 Thread David Boxenhorn
As I understand it, when you write to Cassandra, you are assured that, if
successful, the new data has been written to a log file - so that if there
is a crash your data is safe. Is this correct?

If the above is correct, there is something going on that I don't
understand. Are the log files to which the data is first written the ones
that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
The reason I ask is that when I write a lot of data, nothing seems to change
in the commitlog directory for a long time, then at some point the log files
in this directory get updated. It looks to me like there's memory caching
involved, and the new data is not being immediately written to disk. What is
going on?


Re: Write assurance in Cassandra

2010-07-04 Thread Andrew Rollins
By default Cassandra syncs the commit log to disk periodically, so if you
are looking at file sizes, you won't see the most up to date numbers. This
is just like how if you tail a file that isn't flushing frequently, you
might wait a little while before you see the updates.

In periodic mode, Cassandra acknowledges the write to the client immediately
(even before it is synced). You can run Cassandra in batch mode instead,
which basically means it writes in batches *and* it won't acknowledge the
writes to the client until it has actually synced. I'm still somewhat new to
this, but that's my understanding.

Have a look at CommitLogSync in your storage-conf.xml for more info about
setting up syncing periods.

As an aside, I'm not sure why the ack immediately or ack after sync
setting is piggybacked on the periodic vs batch setting. At first glance it
seems like concepts should be independent of one another.

- Andrew


On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn da...@lookin2.com wrote:

 As I understand it, when you write to Cassandra, you are assured that, if
 successful, the new data has been written to a log file - so that if there
 is a crash your data is safe. Is this correct?

 If the above is correct, there is something going on that I don't
 understand. Are the log files to which the data is first written the ones
 that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
 The reason I ask is that when I write a lot of data, nothing seems to change
 in the commitlog directory for a long time, then at some point the log files
 in this directory get updated. It looks to me like there's memory caching
 involved, and the new data is not being immediately written to disk. What is
 going on?



Re: Write assurance in Cassandra

2010-07-04 Thread David Boxenhorn
Thank you very much! I now understand things much better.

However, my configuration is as follows:

  CommitLogSyncperiodic/CommitLogSync
  CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS

So I should see my commit log change after 10,000 milliseconds = 10 seconds?
It seems to take much longer to show up.

On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins and...@localytics.comwrote:

 By default Cassandra syncs the commit log to disk periodically, so if you
 are looking at file sizes, you won't see the most up to date numbers. This
 is just like how if you tail a file that isn't flushing frequently, you
 might wait a little while before you see the updates.

 In periodic mode, Cassandra acknowledges the write to the client
 immediately (even before it is synced). You can run Cassandra in batch mode
 instead, which basically means it writes in batches *and* it won't
 acknowledge the writes to the client until it has actually synced. I'm still
 somewhat new to this, but that's my understanding.

 Have a look at CommitLogSync in your storage-conf.xml for more info about
 setting up syncing periods.

 As an aside, I'm not sure why the ack immediately or ack after sync
 setting is piggybacked on the periodic vs batch setting. At first glance it
 seems like concepts should be independent of one another.

 - Andrew


 On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn da...@lookin2.com wrote:

 As I understand it, when you write to Cassandra, you are assured that, if
 successful, the new data has been written to a log file - so that if there
 is a crash your data is safe. Is this correct?

 If the above is correct, there is something going on that I don't
 understand. Are the log files to which the data is first written the ones
 that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
 The reason I ask is that when I write a lot of data, nothing seems to change
 in the commitlog directory for a long time, then at some point the log files
 in this directory get updated. It looks to me like there's memory caching
 involved, and the new data is not being immediately written to disk. What is
 going on?





Re: Write assurance in Cassandra

2010-07-04 Thread Andrew Rollins
Is your IO under heavy load? If it is, that may be the cause, otherwise I'm
not sure what causes significant lag. On Linux I like to use iostat -tx 10
to check IO.

- Andrew


On Sun, Jul 4, 2010 at 4:04 AM, David Boxenhorn da...@lookin2.com wrote:

 Thank you very much! I now understand things much better.

 However, my configuration is as follows:

   CommitLogSyncperiodic/CommitLogSync
   CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS

 So I should see my commit log change after 10,000 milliseconds = 10
 seconds? It seems to take much longer to show up.

 On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins and...@localytics.comwrote:

 By default Cassandra syncs the commit log to disk periodically, so if you
 are looking at file sizes, you won't see the most up to date numbers. This
 is just like how if you tail a file that isn't flushing frequently, you
 might wait a little while before you see the updates.

 In periodic mode, Cassandra acknowledges the write to the client
 immediately (even before it is synced). You can run Cassandra in batch mode
 instead, which basically means it writes in batches *and* it won't
 acknowledge the writes to the client until it has actually synced. I'm still
 somewhat new to this, but that's my understanding.

 Have a look at CommitLogSync in your storage-conf.xml for more info about
 setting up syncing periods.

 As an aside, I'm not sure why the ack immediately or ack after sync
 setting is piggybacked on the periodic vs batch setting. At first glance it
 seems like concepts should be independent of one another.

 - Andrew


 On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn da...@lookin2.comwrote:

 As I understand it, when you write to Cassandra, you are assured that, if
 successful, the new data has been written to a log file - so that if there
 is a crash your data is safe. Is this correct?

 If the above is correct, there is something going on that I don't
 understand. Are the log files to which the data is first written the ones
 that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
 The reason I ask is that when I write a lot of data, nothing seems to change
 in the commitlog directory for a long time, then at some point the log files
 in this directory get updated. It looks to me like there's memory caching
 involved, and the new data is not being immediately written to disk. What is
 going on?






Re: Write assurance in Cassandra

2010-07-04 Thread David Boxenhorn
Yes, it was. I was dumping data from Oracle into Cassandra.

On Sun, Jul 4, 2010 at 11:11 AM, Andrew Rollins and...@localytics.comwrote:

 Is your IO under heavy load? If it is, that may be the cause, otherwise I'm
 not sure what causes significant lag. On Linux I like to use iostat -tx 10
 to check IO.

 - Andrew


 On Sun, Jul 4, 2010 at 4:04 AM, David Boxenhorn da...@lookin2.com wrote:

 Thank you very much! I now understand things much better.

 However, my configuration is as follows:

   CommitLogSyncperiodic/CommitLogSync
   CommitLogSyncPeriodInMS1/CommitLogSyncPeriodInMS

 So I should see my commit log change after 10,000 milliseconds = 10
 seconds? It seems to take much longer to show up.

 On Sun, Jul 4, 2010 at 10:52 AM, Andrew Rollins and...@localytics.comwrote:

 By default Cassandra syncs the commit log to disk periodically, so if you
 are looking at file sizes, you won't see the most up to date numbers. This
 is just like how if you tail a file that isn't flushing frequently, you
 might wait a little while before you see the updates.

 In periodic mode, Cassandra acknowledges the write to the client
 immediately (even before it is synced). You can run Cassandra in batch mode
 instead, which basically means it writes in batches *and* it won't
 acknowledge the writes to the client until it has actually synced. I'm still
 somewhat new to this, but that's my understanding.

 Have a look at CommitLogSync in your storage-conf.xml for more info about
 setting up syncing periods.

 As an aside, I'm not sure why the ack immediately or ack after sync
 setting is piggybacked on the periodic vs batch setting. At first glance it
 seems like concepts should be independent of one another.

 - Andrew


 On Sun, Jul 4, 2010 at 3:34 AM, David Boxenhorn da...@lookin2.comwrote:

 As I understand it, when you write to Cassandra, you are assured that,
 if successful, the new data has been written to a log file - so that if
 there is a crash your data is safe. Is this correct?

 If the above is correct, there is something going on that I don't
 understand. Are the log files to which the data is first written the ones
 that look like /var/lib/cassandra/commitlog/CommitLog-1277998453387.log ?
 The reason I ask is that when I write a lot of data, nothing seems to 
 change
 in the commitlog directory for a long time, then at some point the log 
 files
 in this directory get updated. It looks to me like there's memory caching
 involved, and the new data is not being immediately written to disk. What 
 is
 going on?







Re: Running Cassandra as a Windows Service

2010-07-04 Thread Richard Grossman
Hello

Why not using Java Wrapper Service?
http://wrapper.tanukisoftware.org/doc/english/download.jsp
You can configure any java process as real windows services instead of batch
files

Richard

On Thu, Jun 10, 2010 at 8:34 PM, Kochheiser,Todd W - TO-DITT1 
twkochhei...@bpa.gov wrote:

  For various reasons I am required to deploy systems on Windows.  As such,
 I went looking for information on running Cassandra as a Windows service.
 I’ve read some of the user threads regarding running Cassandra as a Windows
 service, such as this one:

 *
 http://www.mail-archive.com/user@cassandra.apache.org/msg01656.html*http://www.mail-archive.com/user@cassandra.apache.org/msg01656.html

 I also found the following JIRA issue:

 
 *https://issues.apache.org/jira/browse/CASSANDRA-292*https://issues.apache.org/jira/browse/CASSANDRA-292

 As it didn’t look like anyone has contributed a formal solution and having
 some experience using Apache’s Procrun (*
 http://commons.apache.org/daemon/procrun.html*http://commons.apache.org/daemon/procrun.html),
 I decided to go ahead and write a batch script and a simple “WindowsService”
 class to accomplish the task.  The WindowsService class only makes calls to
 public methods in CassandraDeamon and is fairly simple.  In combination with
 the batch script, it is very easy to install and remove the service.  At
 this point, I’ve installed Cassandra as a Windows service on XP (32 bit),
 Windows 7 (64 bit) and Windows Server 2008 R1/R2 (64 bit).  It should work
 fine on other version of Windows (2K, 2K3).

 Questions:


1. Has anyone else already done this work?
2. If not, I wouldn’t mind sharing the code/script or contributing it
back to the project.  Is there any interest in this from the Cassandra dev
team or the user community?


 Ideally the WindowsService could be included in the distributed
 source/binary distributions (perhaps in a contrib area) as well as the batch
 script and associated procrun executables.  Or, perhaps it could be posted
 to a Cassandra community site (is there one?).

 Todd








0.7 source code

2010-07-04 Thread Bill Hastings
Where can I find it?

-- 
Cheers
Bill


Re: 0.7 source code

2010-07-04 Thread Peter Schuller
 Where can I find it?

http://wiki.apache.org/cassandra/HowToContribute

In particular:

svn checkout http://svn.apache.org/repos/asf/cassandra/trunk cassandra-trunk

-- 
/ Peter Schuller


Re: Digg 4 Preview on TWiT

2010-07-04 Thread S Ahmed
Agreed, what exactly did they replace it with.

On Sun, Jul 4, 2010 at 8:14 AM, Bill de hÓra b...@dehora.net wrote:

 On Mon, 2010-06-28 at 11:51 -0500, Eric Evans wrote:
  On Mon, 2010-06-28 at 07:53 -0700, Kochheiser,Todd W - TOK-DITT-1 wrote:
   On a related but separate note: While I am fairly new to Cassandra and
   have only been following the mailing lists for a few months, the
   conversation with Kevin Rose on TWiT made me curious if the versions
   of Cassandra that Digg, Twitter, and Facebook are using may end up
   being forks of the Apache project or old versions.
 
  Facebook and Apache have diverged (technically we're the fork). To the
  best of my knowledge, this has always been the case.

 This person's understanding is that Facebook 'no longer contributes to
 nor uses Cassandra.':

 http://redmonk.com/sogrady/2010/05/17/beyond-cassandra/

 I assume it's accurate - policy reasons wouldn't interest me as much as
 technical ones.

 Bill





tool modeling data of cassandra

2010-07-04 Thread StiPh

Can you please tell whether there is a cassandra tool modeling data, which
provides a standard way to define the data and relations between them,
similar Entity-relation diagram in relational databases?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/tool-modeling-data-of-cassandra-tp5254415p5254415.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: 0.7 source code

2010-07-04 Thread Jesse McConnell
ya, trunk is 0.7...just update the version in the build.xml for your
local build and just set it to 0.7-local or something...I think by
default the build.xml is behind the times with what the trunk actually
is

cheers,
jesse

--
jesse mcconnell
jesse.mcconn...@gmail.com



On Sun, Jul 4, 2010 at 08:43, Peter Schuller
peter.schul...@infidyne.com wrote:
 Where can I find it?

 http://wiki.apache.org/cassandra/HowToContribute

 In particular:

 svn checkout http://svn.apache.org/repos/asf/cassandra/trunk cassandra-trunk

 --
 / Peter Schuller