Hi Renato,

Thanks for the detailed reply. I agree with your recommendations on the way
forward. I will go ahead and implement the rest of the functionality using
reflection and we can follow your recommendations on the next iterations.

As for the backend, I am using both HBase and MongoDB and all seems well at
the moment.

I will let you all know why I push my code to GitHub.

Thank you.


**Sheriffo Ceesay**


On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Sheriffo,
>
> Some opinions about your questions, but others are more than welcome
> to suggest other things as well.
>
> Q1: Are we going to consider arbitrary field length, e.g. if we set
> the fieldcount to 100 then we have to create the respective Avro and
> mapping files? Currently,
> I don't think this process is automated and may be tedious for large
> field counts.
> I think for the first code iteration, we should use whatever
> fieldcount you have generated for. Ideally, we should be able to
> invoke the Gora bean generator and generate as many fields as required
> by the benchmark configuration.
>
> Q2: Second: The second problem has to do with the first one, if we
> allow arbitrary field counts, then there has to be a mechanism to call
> each of the set or get methods during CRUD operations. So to avoid
> this I used Java Reflection. See the sample code below.
> We have some options to deal with having arbitrarily number of fields.
> 1) Use reflection as you have which might be ok for the first code
> iteration, but if we want to have some decent performance against
> using datastores natively (no Gora), we should go away from it.
> 2) Do Gora class generation (and also generate the method used to
> insert data through Gora) in a step before the benchmark starts.
> Something like this:
> # passing config parameters to generate Gora Beans with number of
> required fields
> # this should output the generate class and the method that does the
> insertion
> $ gora_compiler.sh --benchmark --fields_required 4
> The output path containing the result of this should be then include
> (or passed) as runtime dependency to the benchmark class.
> 3) Because Gora uses Avro, we can use complex data types, e.g.,
> arrays, maps. So we could represent number of fields as number of
> elements inside an array. I would think that this option gives us the
> best performance.
> I think  we should continue with option (1) until we have the entire
> pipeline working, and we understand how every piece fits together with
> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we
> should do (2) which is the most general and the one that reflects how
> people usually use Gora, and then we test with (3). I think all of
> these steps are totally doable in our time frame as we build upon
> previous steps.
> The other thing that we should decide is which backend to use as there
> are backends that are more mature than others. I'd say to use the
> HBase backend as it is the most stable one and the one with more
> features, and if we feel brave we can try other backends (and fix them
> if necessary!)
>
>
> Best,
>
> Renato M>
>
> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay
> (<sneceesa...@gmail.com>) escribió:
> >
> > Dear Mentors,
> >
> > My week one report is available at
> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >
> > I have also included a detailed question of and I will need your guidance
> > on that.
> >
> > Please let me know what your thoughts are.
> >
> > Thank you.
> >
> > **Sheriffo Ceesay**
>

Reply via email to