Re: Week 2 Report and A Question

Sheriffo Ceesay Wed, 12 Jun 2019 03:50:45 -0700

Hi Renato,

I will follow Alfonso's recommendations about reusing objects as much as I
can. I will push those changes to the branch by the end of this week.


To answer your questions.

Yes, you are right I am using a clean cold JVM. If necessary, I can also
have a look at warming the JVM down the line.

Yes, I have tried setting *gora.hbasestore.scanner.caching* to different
values but there was no significant difference. Also, I may be wrong but  I
think this setting has to do with scan operation and not insert operation?

As for flushing, I tried but it quickly throws an error and hence I
commented that line of code. I think this is due to the fact that the
insert operation inserts a single user object for each call, so calling
dataStore.flush() within that method would mean calling flush on every
object insertion. Is that not the case? There should be a way to track the
progress of inserts then that can be used to call flush after N insert
calls. So I used *gora.hbasestore.hbase.client.autoflush.enabled=true *which
would automatically call flush at some point. However, like I mentioned in
my previous email, enabling autoflush decreases write performance [1].

[1] https://gora.apache.org/current/gora-hbase.html

Thank you.

**Sheriffo Ceesay**


On Tue, Jun 11, 2019 at 10:52 PM Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hey Sheriffo,
>
> Cool to hear you are making progress! :) and great to see that we have
> some numbers already! :)
> Regarding optimization point (1), regardless that this was not he
> cause of the issue or not, Alfonso suggestions are something we should
> follow, many objects with a short life in java might create a
> performance problem sooner or later. Also about your comment:
>
> "Also, I may be wrong but the way I understand YCSB framework is, it
> will execute an insert operation for each user object, so I thought it
> was right to create a user object within the insert method."
>
> As you pointed out, YCSB is about inserting the objects, and NOT about
> creating them, so it doesn't matter if we reuse the objects, as long
> as the values that we insert are actually correct. We don't want to
> end up measuring object creation+gc. I think Alfonso's comment was
> hinting on that direction (please feel free to correct me @Alfonso if
> I am misunderstanding you) and I think his comments are just on the
> spot.
> I have some other questions regarding the numbers you sent around:
> - are you running YCSB for each data store with warm JVM? or are these
> numbers each with a clean cold JVM? I suppose the latter, right?
> - did you try setting gora.hbasestore.scanner.caching to a lower value?
> - which is the command that you are using to run/start this code?
> - did you try flushing the commits more regularly in:
>
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L142
> let's say every 1000 elements? or something like that? I mean instead
> of at the end of the 1M elements?
>
> Thanks a lot for the report Sheriffo!
>
>
> Best,
>
> Renato M.
>
> El mar., 11 jun. 2019 a las 16:12, Sheriffo Ceesay
> (<sneceesa...@gmail.com>) escribió:
> >
> > Hello,
> >
> > I have taken a proper look at the recommendations from @Alfonso and
> @Renato and below are the outcomes.
> >
> > Failed Attempts
> > 1. Optimisation, for the insert operation, to avoid the concatenation
> issue, I have just taken the quickest route by calling the methods directly
> without reflection. Below are those calls. Note: I have moved all reusable
> codes to the init method.
> >
> >> public int insert(String table, String key, HashMap<String,
> ByteIterator> values) {
> >>       user.setField0(values.get("field0").toString());
> >>       user.setField1(values.get("field1").toString());
> >>       user.setField2(values.get("field2").toString());
> >>       user.setField3(values.get("field3").toString());
> >>       user.setField4(values.get("field4").toString());
> >>       user.setField5(values.get("field5").toString());
> >>       user.setField6(values.get("field6").toString());
> >>       user.setField7(values.get("field7").toString());
> >>       user.setField8(values.get("field8").toString());
> >>       user.setField9(values.get("field9").toString());
> >>       dataStore.put(user.getUserId().toString(), user);
> >>     } catch (Exception e) {
> >>       return FAILED;
> >>     }
> >>     return SUCCESS;
> >>   }
> >
> >
> > if the above had worked, I would have changed the code as suggested by
> Alfonso. Also, I may be wrong but the way I understand YCSB framework is,
> it will execute an insert operation for each user object, so I thought it
> was right to create a user object within the insert method.
> >
> >
> > 2. I used different config values for -Xmx (256MB, 512MB, 1GB, 2GB) and
> even disabled GC checking using -XX:-UseGCOverheadLimit but they all failed
> with the same GC error.
> >
> > Successful Attempt -- There may be room for improvement
> > Using the configurations below worked but I think it is not the best for
> write performance.
> >
> > First, I read from [1] related to [2] that the following oneliner code
> should be executed for better HBase performance when using YCSB. It
> basically avoids overloading a single region server.
> >
> > hbase(main):001:0> n_splits = 200 # HBase recommends (10 * number of
> regionservers)
> > hbase(main):002:0> create 'users', 'info', {SPLITS => (1..n_splits).map
> {|i| "user#{1000+i*(9999-1000)/n_splits}"}}
> >
> > Second, as suggested by @Renato Marroquín Mogrovejo , it only works when
> I set
> >
> > hbase.client.autoflush.default=true
> >
> > However, from [3], I found "HBase autoflushing. Enabling autoflush
> decreases write performance. Available since Gora 0.2. Defaults to
> disabled.". So I am of the opinion that the problem is not entirely solved.
> >
> > I have done the following testing to insert 1M records into MongoDB and
> HBase, so I think this may not be bad after all but more benchmarks may be
> required to validate this. HBase in Gora has almost the same performance as
> vanilla YCSB to benchmark it.
> >
> > Backend          Ave Time Taken (sec)
> > MongoDB                      ~90
> > HBase in Gora              ~160
> > HBase YCSB                ~160
> >
> >
> > [1] https://github.com/brianfrankcooper/YCSB/tree/master/hbase098
> > [2] https://issues.apache.org/jira/browse/HBASE-4163
> > [3] https://gora.apache.org/current/gora-hbase.html
> >
> > Comments are welcomed.
> >
> > Thank you.
> > *Sheriffo Ceesay*
> >
> >
> >
> > On Tue, Jun 11, 2019 at 12:04 AM Sheriffo Ceesay <sneceesa...@gmail.com>
> wrote:
> >>
> >> Hello Alfonso and Renato,
> >>
> >> Thank you for getting in touch and thanks for the detailed replies.
> >>
> >> I will have proper look at this tomorrow morning. I did some
> troubleshooting yesterday (mostly playing with Xmx and zookeeper timeout
> settings), that improved the conditions, but it did not entirely solve the
> problem. Preliminary, it seems the problem has to do with configuration or
> how HBaseStore is implemented (this may not be entirely true).
> >>
> >> I will keep you all posted whenever I thoroughly have a look at your
> suggestions.
> >>
> >> Thanks again.
> >>
> >> *Sheriffo Ceesay*
> >>
> >>
> >>
> >> On Mon, Jun 10, 2019 at 11:14 PM Alfonso Nishikawa <
> alfonso.nishik...@gmail.com> wrote:
> >>>
> >>> Hi!
> >>>
> >>> My hypothesis is taht that the difference between MongoDB and HBase is
> that
> >>> HBase put more stress serializing with avro. It could affect too that
> if
> >>> the HBase's test is performed after MongoDB's ones, then the GC starts
> from
> >>> a "bad" situation.
> >>>
> >>> From [A] linked by @Renato, if the error was OutOfMemoryException I
> would
> >>> have recommended lowering gora.hbasestore.scanner.caching to 100, 10 or
> >>> even 1, but with a GC error I am not that much sure. In anycase,
> @Sheriffo:
> >>> you can try this if with the optimizations still doesn't work :)
> >>>
> >>> @Renato: Thx for the links!
> >>>
> >>> Regards,
> >>>
> >>> Alfonso Nishikawa
> >>>
> >>>
> >>>
> >>> El lun., 10 jun. 2019 a las 22:02, Renato Marroquín Mogrovejo (<
> >>> renatoj.marroq...@gmail.com>) escribió:
> >>>
> >>> > @Alfonso,
> >>> > Thank you very much for the suggestions! you are totally right about
> >>> > all of your points! Sheriffo, please benefit from them ;)
> >>> >
> >>> > Also what is strange is this (although it can be optimized as Alfonso
> >>> > pointed out) is that it works for the MongoDB backend. So I would
> also
> >>> > suspect on the configuration of the Gora-HBase client. Have you taken
> >>> > a look at [A] for example? or other Gora-HBase assumed configurations
> >>> > [B]? Maybe there you can specify some Xmx / Xms config.
> >>> >
> >>> >
> >>> > Best,
> >>> >
> >>> > Renato M.
> >>> >
> >>> > [A]
> >>> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/gora.properties
> >>> > [B]
> >>> >
> https://github.com/sneceesay77/gora/blob/master/gora-hbase/src/test/conf/hbase-site.xml
> >>> >
> >>> > El lun., 10 jun. 2019 a las 23:39, Alfonso Nishikawa
> >>> > (<alfonso.nishik...@gmail.com>) escribió:
> >>> > >
> >>> > > Hi again, Sheriffo.
> >>> > >
> >>> > > More improvements to [1] over the last email:
> >>> > >
> >>> > > - fields.toArray() doesn't need a full array like in [6]. You
> should do
> >>> > > just fields.toArray(new String[0]), and better if you create an
> array [0]
> >>> > > and reuse it. That call only needs the type.
> >>> > > - I guess the class at [2] will always be the same, so you don't
> need to
> >>> > > set it on every insert call.
> >>> > > - The string concatenation is overkilling for the jvm on the 1M
> calls * N
> >>> > > fields at [3] and same for [4]. Precalculate the names in a list
> or array
> >>> > > and reuse then for the 1M*N calls.
> >>> > > - Other optimization for [3] is, given that PersistentBase [5]
> exctends
> >>> > > SpecificRecordBase, you can access the fields by index with
> >>> > > SpecificRecordBase.get(int) and SpecificRecordBase.put(int,
> Object).
> >>> > >
> >>> > > [1] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/ma1in/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L127
> >>> > > [2] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L134
> >>> > > [3] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L136
> >>> > > [4] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L139
> >>> > > [5] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-core/src/main/java/org/apache/gora/persistency/impl/PersistentBase.java#L3
> >>> > > [6] -
> >>> > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L163
> >>> > >
> >>> > > Let's see if with that optimizations we free the jvm memory
> management
> >>> > from
> >>> > > much stress.
> >>> > >
> >>> > > Regards,
> >>> > >
> >>> > > Alfonso Nishikawa
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > >
> >>> > > El lun., 10 jun. 2019 a las 21:18, Alfonso Nishikawa (<
> >>> > > alfonso.nishik...@gmail.com>) escribió:
> >>> > >
> >>> > > > Hi, Sheriffo.
> >>> > > >
> >>> > > > You can try reusing the Persistent instances [1] to insert the
> data. I
> >>> > > > don't know all the backends, but they should be reusable, at
> least in
> >>> > > > mongoDB and HBase.
> >>> > > >
> >>> > > > [1] -
> >>> > > >
> >>> >
> https://github.com/sneceesay77/gora/blob/GORA-532/gora-benchmark/src/main/java/org/apache/gora/benchmark/GoraBenchmarkClient.java#L130
> >>> > > >
> >>> > > > Regards,
> >>> > > >
> >>> > > > Alfonso Nishikawa
> >>> > > >
> >>> > > > El lun., 10 jun. 2019 a las 21:14, Alfonso Nishikawa (<
> >>> > > > alfonso.nishik...@gmail.com>) escribió:
> >>> > > >
> >>> > > >> Hi, Sheriffo.
> >>> > > >>
> >>> > > >> I really don't know how to solve it, but are you setting any
> Xmx / Xms
> >>> > > >> configuration values?
> >>> > > >>
> >>> > > >> Regards,
> >>> > > >>
> >>> > > >> Alfonso NIshikawa
> >>> > > >>
> >>> > > >>
> >>> > > >> El sáb., 8 jun. 2019 a las 16:02, Sheriffo Ceesay (<
> >>> > sneceesa...@gmail.com>)
> >>> > > >> escribió:
> >>> > > >>
> >>> > > >>> Hi All,
> >>> > > >>>
> >>> > > >>> Week 2 progress update is available at
> >>> > > >>>
> >>> > > >>>
> >>> >
> https://cwiki.apache.org/confluence/display/GORA/%5BGORA-532%5D+Apache+Gora+Benchmark+Module+Weekly+Report
> >>> > > >>>
> >>> > > >>> I have one question that I would like my mentors to advise on,
> I am
> >>> > still
> >>> > > >>> working it but thought it would be good to report it because
> it is
> >>> > HBase
> >>> > > >>> specific.
> >>> > > >>>
> >>> > > >>> So the problem has to do with an OutOfMemory error when
> inserting 1M
> >>> > +
> >>> > > >>> record in HBase.  This happens when I try to run the actual
> >>> > benchmark by
> >>> > > >>> first loading HBase with 1 million plus records. It works
> perfectly
> >>> > for
> >>> > > >>> MongoDB but not HBase
> >>> > > >>>
> >>> > > >>> So I am assuming this problem is specific to HBase.  The stack
> trace
> >>> > is
> >>> > > >>> given below.
> >>> > > >>>
> >>> > > >>> Exception in thread "Thread-1" java.lang.OutOfMemoryError: GC
> >>> > overhead
> >>> > > >>> limit exceeded
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> java.lang.StringCoding$StringEncoder.encode(StringCoding.java:300)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at java.lang.StringCoding.encode(StringCoding.java:344)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at java.lang.String.getBytes(String.java:918)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> org.apache.hadoop.hbase.util.Bytes.toBytes(Bytes.java:733)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> >>> > > >>>
> >>> >
> org.apache.gora.hbase.util.HBaseByteInterface.toBytes(HBaseByteInterface.java:225)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> >>> > > >>>
> >>> >
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:383)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> >>> > > >>>
> >>> >
> org.apache.gora.hbase.store.HBaseStore.addPutsAndDeletes(HBaseStore.java:348)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>> org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:319)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > org.apache.gora.hbase.store.HBaseStore.put(HBaseStore.java:84)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> >>> > > >>>
> >>> >
> org.apache.gora.benchmark.GoraBenchmarkClient.insert(GoraBenchmarkClient.java:141)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at com.yahoo.ycsb.DBWrapper.insert(DBWrapper.java:148)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at
> >>> > > >>>
> com.yahoo.ycsb.workloads.CoreWorkload.doInsert(CoreWorkload.java:461)
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>         at com.yahoo.ycsb.ClientThread.run(Client.java:269)
> >>> > > >>>
> >>> > > >>> The insert implementation of the module available at
> >>> > > >>>
> https://github.com/sneceesay77/gora/tree/GORA-532/gora-benchmark  in
> >>> > > >>> GoraBenchmarkClient.java is very straight forward. I have had
> a brief
> >>> > > >>> look
> >>> > > >>> at HBaseStore.java put() implementation but could not find an
> issue
> >>> > with
> >>> > > >>> that.
> >>> > > >>>
> >>> > > >>> If I solve this problem, then I will do run more workloads to
> verify
> >>> > that
> >>> > > >>> the module is stable for the basic implementation. Then I will
> go
> >>> > ahead
> >>> > > >>> and
> >>> > > >>> work on suggestions made by Renato last week.
> >>> > > >>>
> >>> > > >>> Please let me know what your thoughts are.
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> Thank you.
> >>> > > >>>
> >>> > > >>>
> >>> > > >>>
> >>> > > >>> **Sheriffo Ceesay**
> >>> > > >>>
> >>> > > >>
> >>> >
>

Re: Week 2 Report and A Question

Reply via email to