The code so far is available at the GitHub link below. https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark
**Sheriffo Ceesay** On Sun, Jun 2, 2019 at 8:34 PM Sheriffo Ceesay <sneceesa...@gmail.com> wrote: > Hi Renato, > > Thanks for the detailed reply. I agree with your recommendations on the > way forward. I will go ahead and implement the rest of the functionality > using reflection and we can follow your recommendations on the next > iterations. > > As for the backend, I am using both HBase and MongoDB and all seems well > at the moment. > > I will let you all know why I push my code to GitHub. > > Thank you. > > > **Sheriffo Ceesay** > > > On Sun, Jun 2, 2019 at 7:01 PM Renato Marroquín Mogrovejo < > renatoj.marroq...@gmail.com> wrote: > >> Hi Sheriffo, >> >> Some opinions about your questions, but others are more than welcome >> to suggest other things as well. >> >> Q1: Are we going to consider arbitrary field length, e.g. if we set >> the fieldcount to 100 then we have to create the respective Avro and >> mapping files? Currently, >> I don't think this process is automated and may be tedious for large >> field counts. >> I think for the first code iteration, we should use whatever >> fieldcount you have generated for. Ideally, we should be able to >> invoke the Gora bean generator and generate as many fields as required >> by the benchmark configuration. >> >> Q2: Second: The second problem has to do with the first one, if we >> allow arbitrary field counts, then there has to be a mechanism to call >> each of the set or get methods during CRUD operations. So to avoid >> this I used Java Reflection. See the sample code below. >> We have some options to deal with having arbitrarily number of fields. >> 1) Use reflection as you have which might be ok for the first code >> iteration, but if we want to have some decent performance against >> using datastores natively (no Gora), we should go away from it. >> 2) Do Gora class generation (and also generate the method used to >> insert data through Gora) in a step before the benchmark starts. >> Something like this: >> # passing config parameters to generate Gora Beans with number of >> required fields >> # this should output the generate class and the method that does the >> insertion >> $ gora_compiler.sh --benchmark --fields_required 4 >> The output path containing the result of this should be then include >> (or passed) as runtime dependency to the benchmark class. >> 3) Because Gora uses Avro, we can use complex data types, e.g., >> arrays, maps. So we could represent number of fields as number of >> elements inside an array. I would think that this option gives us the >> best performance. >> I think we should continue with option (1) until we have the entire >> pipeline working, and we understand how every piece fits together with >> each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we >> should do (2) which is the most general and the one that reflects how >> people usually use Gora, and then we test with (3). I think all of >> these steps are totally doable in our time frame as we build upon >> previous steps. >> The other thing that we should decide is which backend to use as there >> are backends that are more mature than others. I'd say to use the >> HBase backend as it is the most stable one and the one with more >> features, and if we feel brave we can try other backends (and fix them >> if necessary!) >> >> >> Best, >> >> Renato M> >> >> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay >> (<sneceesa...@gmail.com>) escribió: >> > >> > Dear Mentors, >> > >> > My week one report is available at >> > >> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report >> > >> > I have also included a detailed question of and I will need your >> guidance >> > on that. >> > >> > Please let me know what your thoughts are. >> > >> > Thank you. >> > >> > **Sheriffo Ceesay** >> >