Github user revans2 commented on a diff in the pull request:
https://github.com/apache/storm/pull/1094#discussion_r52473785
--- Diff: storm-core/src/jvm/org/apache/storm/utils/Utils.java ---
@@ -330,7 +330,11 @@ public static Map readCommandLineOpts() {
Map ret = new HashMap();
String commandOptions = System.getProperty("storm.options");
if (commandOptions != null) {
- String[] configs = commandOptions.split(",");
+ /*
+ below regex uses negative lookahead to not split in the
middle of
+ json objects '{}' or json arrays '[]'
+ */
+ String[] configs =
commandOptions.split(",(?![^\\[\\]{}]*(]|}))");
--- End diff --
So first of all this is a really complicated REGEX and even if it worked
correctly I would be nervous moving from something simple to something so
complex. To break down the REGEX (Just so I am sure I understand what it is
doing) `,(?![^\\[\\]{}]*(]|}))` will split on a `,` so long as `(?!` it is not
followed by anything that matches `[^\\[\\]{}]*(]|})` which matches any
character that is not one of `[`, `]`, `{`, or `}` zero or more times followed
by either a `]` or `}` character.
In my testing that does not maintain backwards compatibility with what we
had before nor what we currently have. The python code will URL encode the
key/value so no `,` appears in it. That lets us provide backwards
compatibility where I could type odd values that didn't parse to JSON, or other
odd things like having a string in the JSON that had JSON like characters in
it. This code does nothing to handle quoting of strings, and in some cases
with bad JSON it will split things very differently from before, which is at
lest an incompatibility between the windows code and the python code.
On Windows `-c 'foo=["A", "B", "C"' -c bar=B` I forgot to close the array
for foo
```
(clojure.string/split "foo=[\"A\", \"B\", \"C\",bar=B"
#",(?![^\\[\\]{}]*(]|}))")
["foo=[\"A\"" " \"B\"" " \"C\"" "bar=B"]
```
Because we silently skip things that don't make since to us (in keeping
with what happened before but is probably not ideal. we would end up with foo
set to `[\"A\"` and bar set to `B`.
They python code works just fine here, but if for some reason I wanted bar
to be an array or a map It would combine the two options together.
`-c 'foo=["A", "B", "C"' -c 'bar=["B"]'` I forgot to close the array for
foo again
```
(clojure.string/split "foo=[\"A\"%2C \"B\" %2C \"C\",bar=[\"B\"]"
#",(?![^\\[\\]{}]*(]|}))")
["foo=[\"A\"%2C \"B\" %2C \"C\",bar=[\"B\"]"]
```
Now they are all just one parameter and foo will be set to a really ugly
complex string value, and bar will just disappear.
This code cannot go in as is, and making the REGEX more complicated is not
going to solve it. Not to mention the case where for some reason I may want a
value that is URL encoded, and if windows does not URL encode it before sending
it over, we will URL decode it, and ...
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---