Hi again Alfonso (: More comments.
2013/3/7 Alfonso Nishikawa <[email protected]>: > Hi Roland, > >> I've read over the part concerning cassandra. >> Have you seen GORA-211 and our discussion about cloning there? >> Can you explain a bit more what you're thinking about here: >> "Wrongly creates a new Persistent "by hand" instead using >> PersistentBase#clone()" >> I'm relativity sure that my last problem from NUTCH-1534 (the >> InvalidRequestException(why:column name must not be empty)) is located >> somewhere in the cloning code from gora-cassandra, but I can't find it >> right now. > > Sure, explanation going :) > Gora-0.2.1 @ CassandraStore.java#put():286 does this: "* Duplicate > instance to keep all the objects in memory till flushing." > > Some minor important things. > > First creates a new empty instance with: > > T p = (T) value.newInstance(new StateManagerImpl()); > > but actually should have been created with: > > T p = this.getBeanFactory().newPersistent() ; Could you please explain why this is a better creational approach? Do you know how HBase module does this? IMO We should make all data stores use at least a similar approach. > But the real thing is that all that method should be implemented as > following (cloning is done in > PersistentDatumReader#clone(Persistent,Schema):215) : > > public void put(K key, T value) { > this.buffer.put(key, value.clone()) ; > } > > But this is not really important, I guess. Isn't this mainly for MapReduce access? > Anyway, this does not seems to be the problem shown in NUTCH-1534. > I think that the problem in NUTCH-1534 what you told about multiple > threads. CassandraClient is not reentrant because Mutator is not > reentrant, so must be used only with 1 thread. Could you, please, try > this?: > > * Update to gora-0.2.1 > * Modify CassandraStore:340 so the line reads as this: > > private synchronized void addOrUpdateField(K key, Field field, Object value) > { > > The same should be for gora-0.2 (at CassandraStore:301), but I like > 0.2.1 and patches must be for /trunk (desirable). > > Maybe I am wrong, but please, give it a shot :) Yeah, there are several different approaches to get this accomplished. For what I recall from GORA-211, Roland suggested creating a lock object and just synchronizing at the read/write/update operation time, but we would have to evaluate if it causes any performance damage to synchronize the whole encapsulating method. Renato M. > Regards, > > Alfonso Nishikawa > > 2013/3/7 Roland <[email protected]>: >> Hi Alfonso, >> >> I've read over the part concerning cassandra. >> Have you seen GORA-211 and our discussion about cloning there? >> Can you explain a bit more what you're thinking about here: >> "Wrongly creates a new Persistent "by hand" instead using >> PersistentBase#clone()" >> >> I'm relativity sure that my last problem from NUTCH-1534 (the >> InvalidRequestException(why:column name must not be empty)) is located >> somewhere in the cloning code from gora-cassandra, but I can't find it right >> now. >> >> Thanks a lot for this write-up, >> Roland >> >> Am 06.03.2013 11:53, schrieb Alfonso Nishikawa: >> >>> Hello everybody, >>> >>> I finally finished some important notes. I would like to have you reviewed >>> and commented :) >>> https://people.apache.org/~alfonsonishikawa/gora-174-notes.html >>> >>> Thank you! >>> >>> Regards, >>> >>> Alfonso Nishikawa >>> >> > > > > -- > "Drinking bloody marys all night will make you feel like a corpse in > the morning."

