[
https://issues.apache.org/jira/browse/HBASE-18086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16092265#comment-16092265
]
Enis Soztutar commented on HBASE-18086:
---------------------------------------
bq. Updated patch v12 where random number generation is lifted outside the loop
(it was observed that write performance suffered with random number generation
inside the loop).
It does not make sense to me that random number generation is costly. I've
looked at the folly code, there is nothing explaining it. Can you please verify
the total number of columns written in each case. You can also test with just
generating 1M or so random numbers in a loop and measure the total time it
takes end to end. We want each row to come with a different number of columns.
- No use of {{new}} or {{delete}}. Always use smart pointers.
{code}
+ std::thread *writer_threads = new std::thread[FLAGS_threads];
{code}
- These flags should have the same names as the ones in simple-client.cc:
{code}
+DEFINE_int32(multi_get_size, 1, "number of gets in one multi-get");
+DEFINE_bool(skip_get, false, "skip get / scan");
+DEFINE_bool(skip_put, false, "skip put's");
{code}
there is also report_num_rows, scans and multigets and conf flags that you
should implement.
- These should be return values instead of passing pointer to the methods:
{code}
bool *succeeded
{code}
- Instead of executing every Cell as a different Put via Table::Put(), you
should construct one Put object, add all the Cells, then call Table::Put()
{code}
for (uint64_t j = 0; j < rows; j++) {
+ std::string row = PrefixZero(width, iteration * rows + j);
+ for (auto family : families) {
+ table->Put(Put{row}.AddColumn(family, kNumColumn,
std::to_string(n_cols)));
+ for (unsigned int k = 1; k <= n_cols; k++) {
+ table->Put(Put{row}.AddColumn(family, std::to_string(k), row));
+ }
+ }
{code}
- Instead of this method:
{code}
+std::string PrefixZero(int total_width, int num) {
{code}
you can probably do something like this (from scanner-test.cc):
{code}
std::string Row(uint32_t i, int width) {
std::ostringstream s;
s.fill('0');
s.width(width);
s << i;
return "row" + s.str();
}
{code}
- Scans and gets should validate the obtained Result using the same logic, no?
I think you should extract that into a function and use it from both.
- The way we do multi-gets will result in all of the multi-get requests go to
the same region. Instead, I think it is better to have the multi-gets scattered
around most of the regions, so that we have a high likelihood of testing server
failure handling, etc when chaos monkey is run with this. I had argued the same
in my above comments. I think we can do something like a hash-like striping
across the row key space among threads, rather than range-based striping. That
should give us the ability to do multi-gets across all the regions in one
{{Table::Get(std::vector)}} call.
- We don't have multi-put functionality right now, but when that is added, we
should do a follow up patch for this to add multi-put functionality.
- These should default to {{load_test_table}} and {{f}} respectively.
{code}
+DEFINE_string(table, "t", "What table to do the reads and writes with");
+DEFINE_string(families, "d", "comma separated list of column family names");
{code}
> Create native client which creates load on selected cluster
> -----------------------------------------------------------
>
> Key: HBASE-18086
> URL: https://issues.apache.org/jira/browse/HBASE-18086
> Project: HBase
> Issue Type: Sub-task
> Reporter: Ted Yu
> Assignee: Ted Yu
> Attachments: 18086.v11.txt, 18086.v12.txt, 18086.v14.txt,
> 18086.v1.txt, 18086.v3.txt, 18086.v4.txt, 18086.v5.txt, 18086.v6.txt,
> 18086.v7.txt, 18086.v8.txt
>
>
> This task is to create a client which uses multiple threads to conduct Puts
> followed by Gets against selected cluster.
> Default is to run the tool against local cluster.
> This would give us some idea on the characteristics of native client in terms
> of handling high load.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)