Hi Renato,
2013/3/7 Renato Marroquín Mogrovejo <[email protected]>:
>> Gora-0.2.1 @ CassandraStore.java#put():286 does this: "* Duplicate
>> instance to keep all the objects in memory till flushing."
>>
>> Some minor important things.
>>
>> First creates a new empty instance with:
>>
>> T p = (T) value.newInstance(new StateManagerImpl());
>>
>> but actually should have been created with:
>>
>> T p = this.getBeanFactory().newPersistent() ;
>
[reverse questions order]
> Do you know how HBase module does this?
HBase almost surely does it wrong too.
> Could you please explain why this is a better creational approach?
> IMO We should make all data stores use at least a similar approach.
What I told is precisely the similar approach for all stores.
Your question about explaining is a bit difficult to explain without
looking arrogant based on that I don't know what you know or what
anyone who reads the text know. I am not the best for this type of
explanations :P
I will try to do my best, but don't get me wrong.
It's the creacional approach because someone made a BeanFactory (and
BeanFactoryImpl) that is much better than having to remember that when
you create a Persistent you have to give a new StateManager,
etc,etc,etc.
Remember that what we do here should be engineering and not art.
Maximum cohesion, minimum coupling, reuse, and all that stuff. Examine
the tools (in this case the different classes of the architecture) and
use them. This include design patterns (GoF - Desing Patterns).
The BeanFactoryImpl delegates on Persistent, but could be in any other
way. If you want to change the creational way of all of a store, you
just change that bean with DataStoreBase#...
There are other examples of tools like DataStoreFactory to access
configuration (see gora-accumulo, the finest store in my oppinon,
without devaluating the others!).
The BeanFactory belongs to each DataStore, so there is no reason to
crete Persistent instances by hand, having to remember the
StateManager, or if you need it when creating a Map or not (I still
have to dive into sourcecode to see if there's something related to
maps).
Persistent has #clone(), so there's no reason to duplicate code in other places.
As I told, explanations about programming theory are not my best gift :P
>> But the real thing is that all that method should be implemented as
>> following (cloning is done in
>> PersistentDatumReader#clone(Persistent,Schema):215) :
>>
>> public void put(K key, T value) {
>> this.buffer.put(key, value.clone()) ;
>> }
>>
>> But this is not really important, I guess.
>
> Isn't this mainly for MapReduce access?
I think it is not only for MapReduce, but the way to save Persistent
instances. Am I wrong? What is the other way? :S
If there's other way I will have to learn still much more... :(
>> Anyway, this does not seems to be the problem shown in NUTCH-1534.
>> I think that the problem in NUTCH-1534 what you told about multiple
>> threads. CassandraClient is not reentrant because Mutator is not
>> reentrant, so must be used only with 1 thread. Could you, please, try
>> this?:
>>
>> * Update to gora-0.2.1
>> * Modify CassandraStore:340 so the line reads as this:
>>
>> private synchronized void addOrUpdateField(K key, Field field, Object
>> value) {
>>
>> The same should be for gora-0.2 (at CassandraStore:301), but I like
>> 0.2.1 and patches must be for /trunk (desirable).
>>
>> Maybe I am wrong, but please, give it a shot :)
>
> Yeah, there are several different approaches to get this accomplished.
> For what I recall from GORA-211, Roland suggested creating a lock
> object and just synchronizing at the read/write/update operation time,
> but we would have to evaluate if it causes any performance damage to
> synchronize the whole encapsulating method.
You are right. That was the fastest I found and short (only 1 word!)
fix to check.
The real fix is:
* Modify HectorUtils.java:
- change all "public static<K> void insertSubColumn..." to "public
syncronized static<K>..." (3 changes)
- change "public static<K> void deleteSubColumn" to " public
syncronized static<K> void deleteSubColumn" (1 change)
And then you are just synchronizing the calls to the mutator.
Thanks for noticing it.
> Renato M.
Thank you for all your comments! :)
Regards,
Alfonso Nishikawa