Hi, You are correct about HBase. HTable indeed uses the user thread to maintain a buffer for Put operations. Delete operations are not buffered, unfortunately. That's what makes deletes tremendously slow. There is a batchdelete, but there are still some issues that makes it slower than a batchput.
Ferdy. On Wed, Jul 25, 2012 at 8:23 PM, Keith Turner <ke...@deenlo.com> wrote: > On Mon, Jul 23, 2012 at 3:19 PM, Kazuomi Kashii <kazu...@kashii.net> > wrote: > > Hi Lewis, > > > > I used Mac with Core2Quad and 8GB memory yesterday. > > A single node Cassandra server is running, and Goraci/GORA/Cassandra used > > that server. > > " goraci.sh Generator 1 25000000" took about 4 hours to complete. > > > > I saw the message on every 1M nodes written (flushed). > > Since gora-cassandra does not support delete() yet, "goraci.sh Delete" > did > > nothing. > > "goraci.sh Verify" took a few dozens of minutes. > > > > In my understanding, gora-cassandra flushes its buffer only when flush() > or > > close() is explicitly called. > > I have not checked the detail of gora-hbase or gora-accumulo, > > but if they flush the buffer more intelligently, we may want > gora-cassandra > > to support such feature. > > gora-accumulo uses the Accumulo BatchWriter. When the user creates a > BatchWriter to write to Accumulo they specify how much memory and how > many threads it should use. As the user adds mutations to the batch > writer it buffers them. Once the buffered mutations have used half of > the user specified, the mutations are dumped into the background to be > written by a thread pool. If the user specified memory completely > fills up, then writes are held. When a user calls flush, it does not > return until all buffered mutations are written. > > I am not positive, but I think HBase does something similar. > Howerver, I think it does not dump mutations into the background to be > written by a thread pool in parallel. I think HBase uses the user > thread to write to region servers serially when its buffers fills up. > I could be completely wrong, this is all hearsay. I had a discussion > with Todd Lipcon about goraci and the difference in write speed > between HBase and Accumulo. > > > > > Thanks, > > -Kaz > > > > > > > > On 7/23/12 11:40 AM, Lewis John Mcgibbney wrote: > >> > >> Hi Kaz, > >> > >> On Mon, Jul 23, 2012 at 5:47 PM, Kazuomi Kashii <kazu...@kashii.net> > >> wrote: > >>> > >>> I tried Goraci last night, and I had had some dependency problems. > >> > >> How did you get on with gora-cassandra and the goraci suite? I've > >> shared some of my early experiences with Keith [0]. Unfortunately the > >> hardware I'm running the test on in pretty primitive to say the last > >> (small notebook) therefore I fear this is limiting the execution of > >> the tests and Hadoop jobs are timing out and being killed. Also I have > >> a few questions which I would like to reach out on. > >> > >> 1) When we use this test suite is the cassandra system swapping? How > >> can I even find this out? Having spoken to Keith he clarified to me > >> that the test writes in multiples of 1M nodes so if this is done in > >> swap there will be problems. > >> > >> 2) How does gora-cassandra handle buffering? Keith also mentioned that > >> Goraci will write 1000000 nodes and then call flush. Accumulo and > >> Hbase handle this ok. If > >> gora-cassandra actually buffered all 1000000 in memory until flush was > >> called, then this could be bad with my small amount of memory. > >> > >> I'm keen to get some documentation on the execution of gora-cassandra > >> with this test suite to understand more about the internals an of > >> course the limitations of gora-cassandra. > >> > >> Any comments you have at this stage would be excellent. > >> > >>> For my case, I added some dependencies to Goraci's pom.xml, and it > >>> worked, > >>> but I am not sure that it is the same or similar issue to yours. > >>> I used a standalone Cassandra server, not an embedded one, so I did not > >>> include cassandra-all. > >> > >> I'm the same as you here. I suppose this dep can maybe be dropped from > >> the goraci pom,xml in this instance then. > >> > >> Best > >> Lewis > >> > >> [0] https://github.com/keith-turner/goraci/pull/7 > > > > >