Junegunn Choi created HBASE-29365:
-------------------------------------

             Summary: Improve option parser of PerformanceEvaluation
                 Key: HBASE-29365
                 URL: https://issues.apache.org/jira/browse/HBASE-29365
             Project: HBase
          Issue Type: Bug
          Components: PE
            Reporter: Junegunn Choi


h2. Summary

{{PerformanceEvaluation.parseOpts}} spans over 300 lines long with significant 
code duplication, and has several usability issues. It needs to be refactored 
and improved.
 * 
[https://github.com/apache/hbase/blob/787f5459883211b6017cee6a9bcb109f3cb66ddb/hbase-diagnostics/src/main/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L2845-L3158]

h2. Problems
h3. Code duplication

The same code pattern is repeated for each option.
{code:java}
final String rows = "--rows=";
if (cmd.startsWith(rows)) {
  opts.perClientRunRows = Long.parseLong(cmd.substring(rows.length()));
  continue;
}
{code}
 * Define the option prefix as a constant
 * Check if the argument starts with the prefix
 * Use {{substring}} to extract the value
 * Move on to the next argument

We can easily refactor this pattern and reduce the code duplication.
h3. Usability issues

Malformed arguments are not handled gracefully, often resulting in confusing 
error messages. Here are some examples.
h4. Confusing error messages

The {{--rows}} option expects a value, but if omitted, it prints a 
{{ClassNotFoundException}} exception, followed by an error message that doesn't 
clearly indicate the problem.
{noformat}
$ bin/hbase pe --rows
INFO  [main {}] hbase.PerformanceEvaluation: No class found for command: --rows
java.lang.ClassNotFoundException: --rows
        at 
jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) 
~[?:?]
        at 
jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188)
 ~[?:?]
        at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?]
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.isCustomTestClass(PerformanceEvaluation.java:3258)
 ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.isCommandClass(PerformanceEvaluation.java:3251)
 ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:3139)
 ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217)
 ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) 
~[hadoop-common-3.4.1.jar:?]
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) 
~[hadoop-common-3.4.1.jar:?]
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273)
 ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT]
ERROR: Unrecognized option/command: --rows
Usage: hbase pe  <OPTIONS> [-D<property=value>]* <command|class> <nclients>
...
{noformat}
The same holds true for other options that expect a value.

—

The error message for invalid values can also be improved.
{noformat}
$ bin/hbase pe --rows= --table= --compress=
java.lang.NumberFormatException: For input string: ""
        at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:721)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2864)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273)
{noformat}
It doesn't show which option caused the error, so you would have to guess it or 
follow the stack trace to find out.
h4. Inconsistent handling of boolean options

PE supports many boolean options, but their handling is inconsistent. Some 
options require {{--flag=true/false}} syntax, while others just accept 
{{{}--flag{}}}.
 * In this case, this fails because {{autoFlush}} requires a value, while 
{{valueRandom}} doesn't.
{code:java}
$ bin/hbase pe --valueRandom --autoFlush sequentialWrite 1
...
ERROR: Unrecognized option/command: --autoFlush
{code}

 * And in this case, {{valueRandom}} is set to true, because {{=false}} after 
{{--valueRandom}} is simply ignored.
{code:java}
$ bin/hbase pe --valueRandom=false --autoFlush=true sequentialWrite 1

SequentialWriteTest test run options={ ... ,"autoFlush":true, ... 
"valueRandom":true, ... }
{code}

 * Another problem is that they silently ignore illegal values and just treat 
them as {{{}false{}}}. This can lead to confusion, and we should be more strict 
about it.
{code:java}
$ bin/hbase pe --autoFlush=yes sequentialWrite 1

SequentialWriteTest test run options={ ..., "autoFlush":false, ... }
{code}

h2. Suggestion

Refactor the code, make the behavior more consistent, and improve the error 
messages.
h2. Result
 * Shaved off over 180 lines of code
 * Fixed hidden bugs
 * Better error messages

h3. Examples
{code:java}
$ bin/hbase pe --rows
java.lang.IllegalArgumentException: Invalid option: --rows
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.IllegalArgumentException: --rows requires a value
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2937)
        ... 4 more
{code}
{code:java}
$ bin/hbase pe --rows= --table= --compress=
java.lang.IllegalArgumentException: Invalid option: --rows=
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.NumberFormatException: For input string: ""
        at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67)
        at java.base/java.lang.Long.parseLong(Long.java:721)
        at java.base/java.lang.Long.parseLong(Long.java:836)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.lambda$parseOpts$14(PerformanceEvaluation.java:2866)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2939)
        ... 4 more
{code}
{code:java}
$ bin/hbase pe sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":false, ... 
"valueRandom":false, ... }
{code}
{code:java}
$ bin/hbase pe --autoFlush --valueRandom sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":true, ... 
"valueRandom":true, ...}
...
{code}
{code:java}
$ bin/hbase pe --autoFlush=true --valueRandom=false sequentialWrite 1
SequentialWriteTest test run options={ ..., "autoFlush":true, ... 
"valueRandom":false, ... }
...
{code}
{code:java}
$ bin/hbase pe --autoFlush=yes sequentialWrite 1
java.lang.IllegalArgumentException: Invalid option: --autoFlush=yes
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089)
Caused by: java.lang.IllegalArgumentException: Invalid boolean value: yes
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseBoolean(PerformanceEvaluation.java:2973)
        at 
org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2929)
        ... 4 more
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to