Re: Final Report

Sheriffo Ceesay Fri, 23 Aug 2019 07:25:29 -0700

Hi Renato,

See replies inline.


On Thu, Aug 22, 2019 at 5:52 PM Renato Marroquín Mogrovejo <
[email protected]> wrote:

> Hey Sheriffo,
>
> Thanks for the report and all the work!
> Gora performing worst when inserting data in the HBase case I think it
> can make sense, because Gora still needs to serialize every data bean
> through Avro, (maybe some caching? but Sheriffo also deactivated this
> with gora.hbasestore.hbase.client.autoflush.enabled=true) so I guess
> the rest of the time it is just Gora serialization.
>

I agree with you.


> Now for the reads in HBase-native and HBase-Gora, are we sure we are
> getting the same granularity of objects? I mean because of the mapping
> Gora does (different column families per attribute), maybe we are
> fetching the attributes in a different way than HBase is doing, maybe
> Gora fetches only some column families whereas HBase fetches
> everything.
>

I have done some basic test to verify this see the testUpdate() method in
the GoraClientTest file. Here, I insert some strings retrieve them and
verify that they match the expected value.

Did you run any correctness tests to know that we are retrieving the
> correct results in both cases? Something like inserting an integer as
> part of the attributes, and then summing them when retrieved to check
> that the sum is what we expect.
>

Thanks for this, I have added a new test case called testCorrectness() to
handle the issue you have raised. The results I got are consistent with we
are expecting.

>
> Best,
>
> Renato M.
>
> El jue., 22 ago. 2019 a las 5:17, Sheriffo Ceesay
> (<[email protected]>) escribió:
> >
> > Hi Furqan,
> >
> > Yes, it baffled me as well. I haven't made any specific performance
> optimisation configuration to either of the setups so I think these results
> may not be final at this stage and would need further investigation.
> >
> > The only setting I set for HBase for Apache Gora in the gora.properties
> file is:
> >
> > gora.hbasestore.hbase.client.autoflush.enabled=true
> >
> > For the local HBase setup, I have followed the recommendations here [1]
> to avoid any performance issues.
> >
> > https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
> >
> > Basically, the setups are fresh and simplified installations with any
> major configuration for optimisation.
> >
> > Thank you.
> >
> > *Sheriffo Ceesay*
> >
> >
> >
> > On Thu, Aug 22, 2019 at 12:45 PM Furkan KAMACI <[email protected]>
> wrote:
> >>
> >> Hi Sheriffo,
> >>
> >> Thanks for the updates!
> >>
> >> By the way, I still wonder the reason of poorly performance of HBase
> native
> >> implementation.
> >>
> >> Kind Regards,
> >> Furkan KAMACI
> >>
> >> On Thu, Aug 22, 2019 at 2:37 PM Sheriffo Ceesay <[email protected]>
> >> wrote:
> >>
> >> > Hi Furkan,
> >> > Thanks for your feedback.
> >> >
> >> > Please find replies to your comments inline.
> >> >
> >> > On Wed, Aug 21, 2019 at 6:19 PM Furkan KAMACI <[email protected]
> >
> >> > wrote:
> >> >
> >> > > Hi Sheriffo,
> >> > >
> >> > > Thanks for your great effort!
> >> > >
> >> > > 1) Could you separate charts for HBase and MongoDB? HBase charts
> suppress
> >> > > MongoDB ones.
> >> > >
> >> > Yes, this is now done. Can you please have a look?
> >> >
> >> > >
> >> > > 2) Report says that:
> >> > >
> >> > > *"In this work, we have time to include only three gora data stores
> >> > > (MongoDB, HBase and CouchDB)"*
> >> > >
> >> > > However, you have not run this benchmark for CouchDB as far as I
> know?
> >> > >
> >> >
> >> > Yes, you are right that it is not included in the benchmark results
> but I
> >> > have included its implementation in the module. This includes
> >> > auto-generating mapping and related files. Due to time factors, there
> was a
> >> > bit of discussion as to which datastores to include in the preliminary
> >> > benchmarking and we have decided to include HBase and MongoDB. In
> future, I
> >> > will work on adding more data stores and compare their performance as
> well.
> >> >
> >> >
> >> > > 3) I don't think there is a need to add commit hashes and messages
> as
> >> > > Appendix. Especially if we consider that hashes will be changed
> once the
> >> > PR
> >> > > merged into the codebase.
> >> > >
> >> >
> >> > I have seen this as a good tip in the email send by GSoC team, but I
> agree
> >> > with you and I have now removed this.
> >> >
> >> > >
> >> > > Kind Regards,
> >> > > Furkan KAMACI
> >> >
> >> >
> >> > Thank you.
> >> > Sheriffo.
> >> >
> >> > >
> >> >
> >> >
> >> > > On Wed, Aug 21, 2019 at 7:42 PM Sheriffo Ceesay <
> [email protected]>
> >> > > wrote:
> >> > >
> >> > > > All,
> >> > > >
> >> > > > My draft final report is available at
> >> > > >
> >> > > >
> >> > >
> >> >
> https://cwiki.apache.org/confluence/display/GORA/Final+Report%3A+%5BGORA-532%5D+Benchmark+Module+For+Apache+Gora
> >> > > >
> >> > > > We have until 26th of this month submit the report. Please let me
> know
> >> > if
> >> > > > you have any comments to improve it.
> >> > > >
> >> > > > Meanwhile, I will work on the documentation on how to run the
> benchmark
> >> > > > module and publish on gora website.
> >> > > >
> >> > > > Thank you.
> >> > > >
> >> > > > **Sheriffo Ceesay**
> >> > > >
> >> > >
> >> >
>

Re: Final Report

Reply via email to