Re: Week 1 Report and Some Questions

Sheriffo Ceesay Sun, 02 Jun 2019 13:14:40 -0700

The code so far is available at the GitHub link below.

https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark




**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay <[email protected]>
wrote:

> Hi Renato,
>
> Thanks for the detailed reply. I agree with your recommendations on the
> way forward. I will go ahead and implement the rest of the functionality
> using reflection and we can follow your recommendations on the next
> iterations.
>
> As for the backend, I am using both HBase and MongoDB and all seems well
> at the moment.
>
> I will let you all know why I push my code to GitHub.
>
> Thank you.
>
>
> **Sheriffo Ceesay**
>
>
> On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
> [email protected]> wrote:
>
>> Hi Sheriffo,
>>
>> Some opinions about your questions, but others are more than welcome
>> to suggest other things as well.
>>
>> Q1: Are we going to consider arbitrary field length, e.g. if we set
>> the fieldcount to 100 then we have to create the respective Avro and
>> mapping files? Currently,
>> I don't think this process is automated and may be tedious for large
>> field counts.
>> I think for the first code iteration, we should use whatever
>> fieldcount you have generated for. Ideally, we should be able to
>> invoke the Gora bean generator and generate as many fields as required
>> by the benchmark configuration.
>>
>> Q2: Second: The second problem has to do with the first one, if we
>> allow arbitrary field counts, then there has to be a mechanism to call
>> each of the set or get methods during CRUD operations. So to avoid
>> this I used Java Reflection. See the sample code below.
>> We have some options to deal with having arbitrarily number of fields.
>> 1) Use reflection as you have which might be ok for the first code
>> iteration, but if we want to have some decent performance against
>> using datastores natively (no Gora), we should go away from it.
>> 2) Do Gora class generation (and also generate the method used to
>> insert data through Gora) in a step before the benchmark starts.
>> Something like this:
>> # passing config parameters to generate Gora Beans with number of
>> required fields
>> # this should output the generate class and the method that does the
>> insertion
>> $ gora_compiler.sh --benchmark --fields_required 4
>> The output path containing the result of this should be then include
>> (or passed) as runtime dependency to the benchmark class.
>> 3) Because Gora uses Avro, we can use complex data types, e.g.,
>> arrays, maps. So we could represent number of fields as number of
>> elements inside an array. I would think that this option gives us the
>> best performance.
>> I think  we should continue with option (1) until we have the entire
>> pipeline working, and we understand how every piece fits together with
>> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
>> should do (2) which is the most general and the one that reflects how
>> people usually use Gora, and then we test with (3). I think all of
>> these steps are totally doable in our time frame as we build upon
>> previous steps.
>> The other thing that we should decide is which backend to use as there
>> are backends that are more mature than others. I'd say to use the
>> HBase backend as it is the most stable one and the one with more
>> features, and if we feel brave we can try other backends (and fix them
>> if necessary!)
>>
>>
>> Best,
>>
>> Renato M>
>>
>> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
>> (<[email protected]>) escribió:
>> >
>> > Dear Mentors,
>> >
>> > My week one report is available at
>> >
>> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
>> >
>> > I have also included a detailed question of and I will need your
>> guidance
>> > on that.
>> >
>> > Please let me know what your thoughts are.
>> >
>> > Thank you.
>> >
>> > **Sheriffo Ceesay**
>>
>

Re: Week 1 Report and Some Questions

Reply via email to