Hi Alfonso,

2013/3/8 Alfonso Nishikawa <[email protected]>:
> Hi Renato,
>
> 2013/3/7 Renato Marroquín Mogrovejo <[email protected]>:
>>> Gora-0.2.1 @ CassandraStore.java#put():286 does this: "* Duplicate
>>> instance to keep all the objects in memory till flushing."
>>>
>>> Some minor important things.
>>>
>>> First creates a new empty instance with:
>>>
>>>  T p = (T) value.newInstance(new StateManagerImpl());
>>>
>>> but actually should have been created with:
>>>
>>>  T p = this.getBeanFactory().newPersistent() ;
>>
> [reverse questions order]
>> Do you know how HBase module does this?
>
> HBase almost surely does it wrong too.
>
>> Could you please explain why this is a better creational approach?
>> IMO We should make all data stores use at least a similar approach.
>
> What I told is precisely the similar approach for all stores.
>
> Your question about explaining is a bit difficult to explain without
> looking arrogant based on that I don't know what you know or what
> anyone who reads the text know. I am not the best for this type of
> explanations :P

I think that if we can explain, then we should explain because we
never know who is reading our emails, besides that your explanations
are great man (:

> I will try to do my best, but don't get me wrong.
>
> It's the creacional approach because someone made a BeanFactory (and
> BeanFactoryImpl) that is much better than having to remember that when
> you create a Persistent you have to give a new StateManager,
> etc,etc,etc.
> Remember that what we do here should be engineering and not art.
> Maximum cohesion, minimum coupling, reuse, and all that stuff. Examine
> the tools (in this case the different classes of the architecture) and
> use them. This include design patterns (GoF - Desing Patterns).
> The BeanFactoryImpl delegates on Persistent, but could be in any other
> way. If you want to change the creational way of all of a store, you
> just change that bean with DataStoreBase#...
> There are other examples of tools like DataStoreFactory to access
> configuration (see gora-accumulo, the finest store in my oppinon,
> without devaluating the others!).

You are right, that is simplest and straight forward module we have

> The BeanFactory belongs to each DataStore, so there is no reason to
> crete Persistent instances by hand, having to remember the
> StateManager, or if you need it when creating a Map or not (I still
> have to dive into sourcecode to see if there's something related to
> maps).
> Persistent has #clone(), so there's no reason to duplicate code in other 
> places.

Thanks! This is what I was asking, I didn't know that each DataStore
had its own BeanFactory, we totally should use them.

> As I told, explanations about programming theory are not my best gift :P

Agree to disagree ;)

>>> But the real thing is that all that method should be implemented as
>>> following (cloning is done in
>>> PersistentDatumReader#clone(Persistent,Schema):215) :
>>>
>>>  public void put(K key, T value) {
>>>    this.buffer.put(key, value.clone()) ;
>>>  }
>>>
>>> But this is not really important, I guess.
>>
>> Isn't this mainly for MapReduce access?
>
> I think it is not only for MapReduce, but the way to save Persistent
> instances. Am I wrong? What is the other way? :S
> If there's other way I will have to learn still much more... :(

For what I've seen the DatumReader classes are for being used inside
MapReduce jobs, but I might be wrong here. I think the best is to stay
to the beanFactory per dataStore, this will help us engineer our code
as you said ;)  thanks for pointing this out


Renato M.

>>> Anyway, this does not seems to be the problem shown in NUTCH-1534.
>>> I think that the problem in NUTCH-1534 what you told about multiple
>>> threads. CassandraClient is not reentrant because Mutator is not
>>> reentrant, so must be used only with 1 thread. Could you, please, try
>>> this?:
>>>
>>> * Update to gora-0.2.1
>>> * Modify CassandraStore:340 so the line reads as this:
>>>
>>>  private synchronized void addOrUpdateField(K key, Field field, Object 
>>> value) {
>>>
>>> The same should be for gora-0.2 (at CassandraStore:301), but I like
>>> 0.2.1 and patches must be for /trunk (desirable).
>>>
>>> Maybe I am wrong, but please, give it a shot :)
>>
>> Yeah, there are several different approaches to get this accomplished.
>> For what I recall from GORA-211, Roland suggested creating a lock
>> object and just synchronizing at the read/write/update operation time,
>> but we would have to evaluate if it causes any performance damage to
>> synchronize the whole encapsulating method.
>
> You are right. That was the fastest I found and short (only 1 word!)
> fix to check.
>
> The real fix is:
>
> * Modify HectorUtils.java:
> - change all "public static<K> void insertSubColumn..." to "public
> syncronized static<K>..." (3 changes)
> - change "public static<K> void deleteSubColumn" to " public
> syncronized static<K> void deleteSubColumn" (1 change)
>
> And then you are just synchronizing the calls to the mutator.
>
> Thanks for noticing it.
>
>> Renato M.
>
> Thank you for all your comments! :)
>
> Regards,
>
> Alfonso Nishikawa

Reply via email to