Hi Alfonso,
2013/3/8 Alfonso Nishikawa <[email protected]>: > Hi Renato, > > 2013/3/7 Renato Marroquín Mogrovejo <[email protected]>: >>> Gora-0.2.1 @ CassandraStore.java#put():286 does this: "* Duplicate >>> instance to keep all the objects in memory till flushing." >>> >>> Some minor important things. >>> >>> First creates a new empty instance with: >>> >>> T p = (T) value.newInstance(new StateManagerImpl()); >>> >>> but actually should have been created with: >>> >>> T p = this.getBeanFactory().newPersistent() ; >> > [reverse questions order] >> Do you know how HBase module does this? > > HBase almost surely does it wrong too. > >> Could you please explain why this is a better creational approach? >> IMO We should make all data stores use at least a similar approach. > > What I told is precisely the similar approach for all stores. > > Your question about explaining is a bit difficult to explain without > looking arrogant based on that I don't know what you know or what > anyone who reads the text know. I am not the best for this type of > explanations :P I think that if we can explain, then we should explain because we never know who is reading our emails, besides that your explanations are great man (: > I will try to do my best, but don't get me wrong. > > It's the creacional approach because someone made a BeanFactory (and > BeanFactoryImpl) that is much better than having to remember that when > you create a Persistent you have to give a new StateManager, > etc,etc,etc. > Remember that what we do here should be engineering and not art. > Maximum cohesion, minimum coupling, reuse, and all that stuff. Examine > the tools (in this case the different classes of the architecture) and > use them. This include design patterns (GoF - Desing Patterns). > The BeanFactoryImpl delegates on Persistent, but could be in any other > way. If you want to change the creational way of all of a store, you > just change that bean with DataStoreBase#... > There are other examples of tools like DataStoreFactory to access > configuration (see gora-accumulo, the finest store in my oppinon, > without devaluating the others!). You are right, that is simplest and straight forward module we have > The BeanFactory belongs to each DataStore, so there is no reason to > crete Persistent instances by hand, having to remember the > StateManager, or if you need it when creating a Map or not (I still > have to dive into sourcecode to see if there's something related to > maps). > Persistent has #clone(), so there's no reason to duplicate code in other > places. Thanks! This is what I was asking, I didn't know that each DataStore had its own BeanFactory, we totally should use them. > As I told, explanations about programming theory are not my best gift :P Agree to disagree ;) >>> But the real thing is that all that method should be implemented as >>> following (cloning is done in >>> PersistentDatumReader#clone(Persistent,Schema):215) : >>> >>> public void put(K key, T value) { >>> this.buffer.put(key, value.clone()) ; >>> } >>> >>> But this is not really important, I guess. >> >> Isn't this mainly for MapReduce access? > > I think it is not only for MapReduce, but the way to save Persistent > instances. Am I wrong? What is the other way? :S > If there's other way I will have to learn still much more... :( For what I've seen the DatumReader classes are for being used inside MapReduce jobs, but I might be wrong here. I think the best is to stay to the beanFactory per dataStore, this will help us engineer our code as you said ;) thanks for pointing this out Renato M. >>> Anyway, this does not seems to be the problem shown in NUTCH-1534. >>> I think that the problem in NUTCH-1534 what you told about multiple >>> threads. CassandraClient is not reentrant because Mutator is not >>> reentrant, so must be used only with 1 thread. Could you, please, try >>> this?: >>> >>> * Update to gora-0.2.1 >>> * Modify CassandraStore:340 so the line reads as this: >>> >>> private synchronized void addOrUpdateField(K key, Field field, Object >>> value) { >>> >>> The same should be for gora-0.2 (at CassandraStore:301), but I like >>> 0.2.1 and patches must be for /trunk (desirable). >>> >>> Maybe I am wrong, but please, give it a shot :) >> >> Yeah, there are several different approaches to get this accomplished. >> For what I recall from GORA-211, Roland suggested creating a lock >> object and just synchronizing at the read/write/update operation time, >> but we would have to evaluate if it causes any performance damage to >> synchronize the whole encapsulating method. > > You are right. That was the fastest I found and short (only 1 word!) > fix to check. > > The real fix is: > > * Modify HectorUtils.java: > - change all "public static<K> void insertSubColumn..." to "public > syncronized static<K>..." (3 changes) > - change "public static<K> void deleteSubColumn" to " public > syncronized static<K> void deleteSubColumn" (1 change) > > And then you are just synchronizing the calls to the mutator. > > Thanks for noticing it. > >> Renato M. > > Thank you for all your comments! :) > > Regards, > > Alfonso Nishikawa

