Hi Sheriffo, Some opinions about your questions, but others are more than welcome to suggest other things as well.
Q1: Are we going to consider arbitrary field length, e.g. if we set the fieldcount to 100 then we have to create the respective Avro and mapping files? Currently, I don't think this process is automated and may be tedious for large field counts. I think for the first code iteration, we should use whatever fieldcount you have generated for. Ideally, we should be able to invoke the Gora bean generator and generate as many fields as required by the benchmark configuration. Q2: Second: The second problem has to do with the first one, if we allow arbitrary field counts, then there has to be a mechanism to call each of the set or get methods during CRUD operations. So to avoid this I used Java Reflection. See the sample code below. We have some options to deal with having arbitrarily number of fields. 1) Use reflection as you have which might be ok for the first code iteration, but if we want to have some decent performance against using datastores natively (no Gora), we should go away from it. 2) Do Gora class generation (and also generate the method used to insert data through Gora) in a step before the benchmark starts. Something like this: # passing config parameters to generate Gora Beans with number of required fields # this should output the generate class and the method that does the insertion $ gora_compiler.sh --benchmark --fields_required 4 The output path containing the result of this should be then include (or passed) as runtime dependency to the benchmark class. 3) Because Gora uses Avro, we can use complex data types, e.g., arrays, maps. So we could represent number of fields as number of elements inside an array. I would think that this option gives us the best performance. I think we should continue with option (1) until we have the entire pipeline working, and we understand how every piece fits together with each other (YSCB, Gora, Gora compiler, benchmark setup steps). Then we should do (2) which is the most general and the one that reflects how people usually use Gora, and then we test with (3). I think all of these steps are totally doable in our time frame as we build upon previous steps. The other thing that we should decide is which backend to use as there are backends that are more mature than others. I'd say to use the HBase backend as it is the most stable one and the one with more features, and if we feel brave we can try other backends (and fix them if necessary!) Best, Renato M> El dom., 2 jun. 2019 a las 19:10, Sheriffo Ceesay (<sneceesa...@gmail.com>) escribió: > > Dear Mentors, > > My week one report is available at > https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report > > I have also included a detailed question of and I will need your guidance > on that. > > Please let me know what your thoughts are. > > Thank you. > > **Sheriffo Ceesay**