Hi, I think were at a stage where you're right Chris. Further to Alexis' commit, I feel that this has been bottomed out. Further to this, we are now at Cassandra version 0.8.1. Are you happy with this Alexis?
Thanks On Sat, Oct 1, 2011 at 6:33 PM, Mattmann, Chris A (388J) < chris.a.mattm...@jpl.nasa.gov> wrote: > Great work, thanks Alexis! Maybe it's time to close out GORA-22 then > and leave any future things that crop up as new issues. > > Cheers, > Chris > > On Oct 1, 2011, at 4:07 AM, Alexis wrote: > > > Last revision 1177960 should now fix the thread-safe issue: > > > > > http://svn.apache.org/viewvc/incubator/gora/trunk/gora-cassandra/src/main/java/org/apache/gora/cassandra/store/CassandraStore.java?r1=1177960&r2=1177959&pathrev=1177960 > > > > Please comment on https://issues.apache.org/jira/browse/GORA-22 if > > there is anything else. > > > > Alexis > > > > On Sun, Sep 4, 2011 at 10:43 AM, Alexis <alexis.detregl...@gmail.com> > wrote: > >> Hi, > >> > >> I submitted the patch for peer review by just attaching it to the > >> issue: https://issues.apache.org/jira/browse/GORA-22 > >> > >> See this article about concurreny and hashmap to read about the topic: > >> http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html > >> > >> I ended up calling toArray over the key set to get around the > >> ConcurrentModificationException thrown by defaut with > >> java.util.HashMap when iterating over the keys. > >> > >> Not that many times I encountered Cassandra crashes and Hector > >> exceptions (usually because of GC triggered by Cassandra daemon?) with > >> my poor 5-year-old laptop while running Nutch parse command, which is > >> very CPU and IO intensive. In mapred-site.xml, see attached config, it > >> worked out when you make the read batch reasonable (400 rows at a > >> time) and try to separate it from the write batch (for example 843 > >> written rows per batch) so that they don't happen simultaneously. > >> > >> > >> Alexis > >> > >> On Tue, Aug 30, 2011 at 1:24 AM, Alexis <alexis.detregl...@gmail.com> > wrote: > >>> Hi Tom, > >>> > >>> Thanks for testing Nutch 2.0 & Cassandra and reporting the obvious > >>> bug. I must say there is not a very active development and testing on > >>> Gora & Nutch, but at least there is some. > >>> > >>> > >>> 1. As regards your ConcurrentModification issue, it looks like it > >>> happens when flushing the store. From your exception stacktrace: > >>> (Line 192 in org.apache.gora.cassandra.store.CassandraStore) > >>> for (K key: this.buffer.keySet()) { > >>> > >>> while there are other threads adding new keys to the HashMap: > >>> > >>> (Line 266) > >>> this.buffer.put(key, p); > >>> > >>> "it is not generally permissible for one thread to modify a Collection > >>> while another thread is iterating over it." > >>> > >>> Let me try to reproduce the bug and fix it with this in mind: > >>> How about introducing some mutex / lock mechanism witch > >>> java.util.concurrent.locks.Lock or easier, using a thread-safe > >>> implementation such as java.util.concurrent.ConcurrentHashMap? > >>> > >>> > >>> 2. Regarding the OutOfMemory error, maybe decreasing the flushing > >>> frecuency as described here? > >>> > http://techvineyard.blogspot.com/2011/02/gora-orm-framework-for-hadoop-jobs.html#I_O_Frequency > >>> > >>> I like to use the jvisualvm utility from the JDK that monitors the > >>> memory usage and tells you how this evolves during the execution of > >>> the class... > >>> > >>> Alexis > >>> > >>> On Mon, Aug 29, 2011 at 1:50 PM, Tom Davidson <tdavid...@covario.com> > wrote: > >>>> Hi Lewis, > >>>> > >>>> I was running Nutch deployed with a dedicated Cassandra cluster. > Frankly, I have given up on using Nutch 2 at this time as it seems highly > unstable and not really in active development. Your effort to address this > is encouraging. Because Nutch uses multithreading in the fetchers, I was > getting ConcurrentModification errors and OutOfMemory errors on a regular > basis in the CassandraStore. As far as I recall, the caching/flushing > implementation is just not thread safe. If the CassandraStore caching was > completely removed it may work, but would probably not be very efficient. > If I were to fix this class, I would try to rewrite it to use Hector > batched mutations instead. > >>>> > >>>> Tom > >>>> > >>>> -----Original Message----- > >>>> From: lewis john mcgibbney [mailto:lewis.mcgibb...@gmail.com] > >>>> Sent: Monday, August 29, 2011 1:41 PM > >>>> To: gora-dev@incubator.apache.org; d...@nutch.apache.org > >>>> Subject: Re: Gora CassandraStore is not thread safe? > >>>> > >>>> Hi Tom, > >>>> > >>>> Apologies for cross posting, this would not usually be the case but > I'm > >>>> hoping that if any results come from the thread then both communities > can > >>>> benefit. > >>>> > >>>> I'm in the process of getting Cassandra 0.8.4 working with Nutch 2.0 > and > >>>> Gora 0.2 myself and seem to be having some nasty problems. > >>>> > >>>> Some questions for you > >>>> > >>>> 1) How are you running Nutch local or deploy? > >>>> 2) How are you running Cassandra, local or deployed in a cluster? > >>>> > >>>> The obvious thoughts are that this is a bug and that there are > >>>> method(s)/object(s) which are not safe. > >>>> > >>>> Have you gotten any further with this? > >>>> > >>>> Lewis > >>>> > >>>> > >>>> On Wed, Aug 10, 2011 at 8:43 PM, Tom Davidson <tdavid...@covario.com> > wrote: > >>>> > >>>>> Has anyone tested the CassandraStore in gora 0.2 using multiple > threads? > >>>>> The nutch 2 fetcher architecture has many threads writing to one > >>>>> GoraRecordWriter and I am getting concurrent modification errors like > below. > >>>>> > >>>>> Caused by: java.util.ConcurrentModificationException > >>>>> at > java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) > >>>>> at java.util.HashMap$KeyIterator.next(HashMap.java:828) > >>>>> at > >>>>> > org.apache.gora.cassandra.store.CassandraStore.flush(CassandraStore.java:192) > >>>>> at > >>>>> > org.apache.gora.mapreduce.GoraRecordWriter.write(GoraRecordWriter.java:65) > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> -- > >>>> *Lewis* > >>>> > >>> > >> > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: chris.a.mattm...@nasa.gov > WWW: http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > -- *Lewis*