Junegunn Choi created HBASE-29365: ------------------------------------- Summary: Improve option parser of PerformanceEvaluation Key: HBASE-29365 URL: https://issues.apache.org/jira/browse/HBASE-29365 Project: HBase Issue Type: Bug Components: PE Reporter: Junegunn Choi
h2. Summary {{PerformanceEvaluation.parseOpts}} spans over 300 lines long with significant code duplication, and has several usability issues. It needs to be refactored and improved. * [https://github.com/apache/hbase/blob/787f5459883211b6017cee6a9bcb109f3cb66ddb/hbase-diagnostics/src/main/java/org/apache/hadoop/hbase/PerformanceEvaluation.java#L2845-L3158] h2. Problems h3. Code duplication The same code pattern is repeated for each option. {code:java} final String rows = "--rows="; if (cmd.startsWith(rows)) { opts.perClientRunRows = Long.parseLong(cmd.substring(rows.length())); continue; } {code} * Define the option prefix as a constant * Check if the argument starts with the prefix * Use {{substring}} to extract the value * Move on to the next argument We can easily refactor this pattern and reduce the code duplication. h3. Usability issues Malformed arguments are not handled gracefully, often resulting in confusing error messages. Here are some examples. h4. Confusing error messages The {{--rows}} option expects a value, but if omitted, it prints a {{ClassNotFoundException}} exception, followed by an error message that doesn't clearly indicate the problem. {noformat} $ bin/hbase pe --rows INFO [main {}] hbase.PerformanceEvaluation: No class found for command: --rows java.lang.ClassNotFoundException: --rows at jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:641) ~[?:?] at jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:188) ~[?:?] at java.lang.ClassLoader.loadClass(ClassLoader.java:525) ~[?:?] at org.apache.hadoop.hbase.PerformanceEvaluation.isCustomTestClass(PerformanceEvaluation.java:3258) ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at org.apache.hadoop.hbase.PerformanceEvaluation.isCommandClass(PerformanceEvaluation.java:3251) ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:3139) ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217) ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) ~[hadoop-common-3.4.1.jar:?] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) ~[hadoop-common-3.4.1.jar:?] at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273) ~[hbase-diagnostics-4.0.0-alpha-1-SNAPSHOT.jar:4.0.0-alpha-1-SNAPSHOT] ERROR: Unrecognized option/command: --rows Usage: hbase pe <OPTIONS> [-D<property=value>]* <command|class> <nclients> ... {noformat} The same holds true for other options that expect a value. — The error message for invalid values can also be improved. {noformat} $ bin/hbase pe --rows= --table= --compress= java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) at java.base/java.lang.Long.parseLong(Long.java:721) at java.base/java.lang.Long.parseLong(Long.java:836) at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2864) at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3217) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3273) {noformat} It doesn't show which option caused the error, so you would have to guess it or follow the stack trace to find out. h4. Inconsistent handling of boolean options PE supports many boolean options, but their handling is inconsistent. Some options require {{--flag=true/false}} syntax, while others just accept {{{}--flag{}}}. * In this case, this fails because {{autoFlush}} requires a value, while {{valueRandom}} doesn't. {code:java} $ bin/hbase pe --valueRandom --autoFlush sequentialWrite 1 ... ERROR: Unrecognized option/command: --autoFlush {code} * And in this case, {{valueRandom}} is set to true, because {{=false}} after {{--valueRandom}} is simply ignored. {code:java} $ bin/hbase pe --valueRandom=false --autoFlush=true sequentialWrite 1 SequentialWriteTest test run options={ ... ,"autoFlush":true, ... "valueRandom":true, ... } {code} * Another problem is that they silently ignore illegal values and just treat them as {{{}false{}}}. This can lead to confusion, and we should be more strict about it. {code:java} $ bin/hbase pe --autoFlush=yes sequentialWrite 1 SequentialWriteTest test run options={ ..., "autoFlush":false, ... } {code} h2. Suggestion Refactor the code, make the behavior more consistent, and improve the error messages. h2. Result * Shaved off over 180 lines of code * Fixed hidden bugs * Better error messages h3. Examples {code:java} $ bin/hbase pe --rows java.lang.IllegalArgumentException: Invalid option: --rows at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943) at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089) Caused by: java.lang.IllegalArgumentException: --rows requires a value at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2937) ... 4 more {code} {code:java} $ bin/hbase pe --rows= --table= --compress= java.lang.IllegalArgumentException: Invalid option: --rows= at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943) at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089) Caused by: java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) at java.base/java.lang.Long.parseLong(Long.java:721) at java.base/java.lang.Long.parseLong(Long.java:836) at org.apache.hadoop.hbase.PerformanceEvaluation.lambda$parseOpts$14(PerformanceEvaluation.java:2866) at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2939) ... 4 more {code} {code:java} $ bin/hbase pe sequentialWrite 1 SequentialWriteTest test run options={ ..., "autoFlush":false, ... "valueRandom":false, ... } {code} {code:java} $ bin/hbase pe --autoFlush --valueRandom sequentialWrite 1 SequentialWriteTest test run options={ ..., "autoFlush":true, ... "valueRandom":true, ...} ... {code} {code:java} $ bin/hbase pe --autoFlush=true --valueRandom=false sequentialWrite 1 SequentialWriteTest test run options={ ..., "autoFlush":true, ... "valueRandom":false, ... } ... {code} {code:java} $ bin/hbase pe --autoFlush=yes sequentialWrite 1 java.lang.IllegalArgumentException: Invalid option: --autoFlush=yes at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2943) at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3033) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:82) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:97) at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3089) Caused by: java.lang.IllegalArgumentException: Invalid boolean value: yes at org.apache.hadoop.hbase.PerformanceEvaluation.parseBoolean(PerformanceEvaluation.java:2973) at org.apache.hadoop.hbase.PerformanceEvaluation.parseOpts(PerformanceEvaluation.java:2929) ... 4 more {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)