Hi Renato, Thanks for the detailed reply. I agree with your recommendations on the way forward. I will go ahead and implement the rest of the functionality using reflection and we can follow your recommendations on the next iterations.
As for the backend, I am using both HBase and MongoDB and all seems well at the moment. I will let you all know why I push my code to GitHub. Thank you. **Sheriffo Ceesay** On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > Hi Sheriffo, > > Some opinions about your questions, but others are more than welcome > to suggest other things as well. > > Q1: Are we going to consider arbitrary field length, e.g. if we set > the fieldcount to 100 then we have to create the respective Avro and > mapping files? Currently, > I don't think this process is automated and may be tedious for large > field counts. > I think for the first code iteration, we should use whatever > fieldcount you have generated for. Ideally, we should be able to > invoke the Gora bean generator and generate as many fields as required > by the benchmark configuration. > > Q2: Second: The second problem has to do with the first one, if we > allow arbitrary field counts, then there has to be a mechanism to call > each of the set or get methods during CRUD operations. So to avoid > this I used Java Reflection. See the sample code below. > We have some options to deal with having arbitrarily number of fields. > 1) Use reflection as you have which might be ok for the first code > iteration, but if we want to have some decent performance against > using datastores natively (no Gora), we should go away from it. > 2) Do Gora class generation (and also generate the method used to > insert data through Gora) in a step before the benchmark starts. > Something like this: > # passing config parameters to generate Gora Beans with number of > required fields > # this should output the generate class and the method that does the > insertion > $ gora_compiler.sh --benchmark --fields_required 4 > The output path containing the result of this should be then include > (or passed) as runtime dependency to the benchmark class. > 3) Because Gora uses Avro, we can use complex data types, e.g., > arrays, maps. So we could represent number of fields as number of > elements inside an array. I would think that this option gives us the > best performance. > I think we should continue with option (1) until we have the entire > pipeline working, and we understand how every piece fits together with > each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we > should do (2) which is the most general and the one that reflects how > people usually use Gora, and then we test with (3). I think all of > these steps are totally doable in our time frame as we build upon > previous steps. > The other thing that we should decide is which backend to use as there > are backends that are more mature than others. I'd say to use the > HBase backend as it is the most stable one and the one with more > features, and if we feel brave we can try other backends (and fix them > if necessary!) > > > Best, > > Renato M> > > El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay > (<sneceesa...@gmail.com>) escribió: > > > > Dear Mentors, > > > > My week one report is available at > > > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > > > I have also included a detailed question of and I will need your guidance > > on that. > > > > Please let me know what your thoughts are. > > > > Thank you. > > > > **Sheriffo Ceesay** >