[
https://issues.apache.org/jira/browse/NIFI-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193025#comment-15193025
]
Michal Klempa commented on NIFI-1562:
-------------------------------------
[~JPercivall] Well, Mac set of command line tools is not the one you get from
GNU. Do the testing on some Linux distro.
Ad apostrophes: No, command supplied to NiFi should not contain them. The
apostrophes are in Bash prompt, since this is the way you tell the Bash: hey,
bash, i would like to execute this command with literally this argument in
apostrophes. If you do not put argument into apostrophes, bash is doing
trimming of arguments, removing whitespace. So when you are willing to send an
argument ' = ' to you command, you have to put it into apostrophes. So value ''
is an idiom to tell Bash: send there an empty string (the ['\0'] array in C
language notation if you want). When the command is executed using some sort of
API to execute command, such as ProcessBuilder, one has to supply an empty
string as a part of arguments array. And that is simply not happening in NiFi,
since it is using too much trimming (just like bash) and you have no option
from stoppping this behavior, thus you cannot send string "" (C array ['\0'])
as command argument.
Passing empty string to command utils is not something weird
(http://serverfault.com/questions/446652/passing-an-empty-string-as-a-commandline-argument-using-a-bash-variable-to-a-com),
each util which has some argument with 'separator' semantics will trigger this
requirement, sometime you just do not want to separate output :)
> ExecuteStreamCommand and ExecuteProcess do not support empty command line
> arguments
> -----------------------------------------------------------------------------------
>
> Key: NIFI-1562
> URL: https://issues.apache.org/jira/browse/NIFI-1562
> Project: Apache NiFi
> Issue Type: Bug
> Components: Extensions
> Affects Versions: 0.5.0, 0.4.1
> Reporter: Michal Klempa
> Labels: patch-available
>
> Argument splitting is cluttered with trimming the whitespaces around the
> whole argument list and also for each individual argument.
> This causes wrong behavior when DataFlow Manager needs to put empty string as
> an argument for command using ExecuteStreamCommand and ExecuteProcess.
> Lets start by what DataFlow Manager needs to achieve (steps to reproduce):
> 1. Create a file "test.tsv" with *TAB* separated content:
> {code}
> one two three
> this is one string
> {code}
> 2. Put GetFile Prrocessor to obtain this file into DataFlow
> 3. Connect GetFile to ExecuteStreamCommand.
> 4. ExecuteStreamCommand configuration:
> - Command Path: cut
> - Command Arguments: {code}-f;1,2,3,4;--output-delimiter;{code}
> - auto terminate: original
> 5. Put LogAttribute (Log Payload: true, autoterminate: success) and connect
> ExecuteStreamCommand to LogAttribute to see the output.
> 6. Run this Flow.
> Expected output:
> {code}
> onetwothree
> thisisonestring
> {code}
> As the --output-delimiter argument to cut command is empty string (notice the
> last semicolon in argument list), cut command effectively joins columns.
> This output can be obtained by issuing this command from within bash:
> {code}
> $ cut -f 1,2,3,4 --output-delimiter '' test.csv
> {code}
> Those are apostrophes (to tell bash it is an empty argument).
> Actual output:
> ExecuteStreamCommand informs Bulletin of cut command error:
> {code}
> 06:14:27 UTC
> ERROR
> fb12bb69-37e0-4e23-927c-a8aba40f360d
> ExecuteStreamCommand[id=fb12bb69-37e0-4e23-927c-a8aba40f360d] Transferring
> flow file
> StandardFlowFileRecord[uuid=d94c9e62-1005-4a2d-815d-bdb4c02ebd85,claim=StandardContentClaim
> [resourceClaim=StandardResourceClaim[id=1456380578601-1, container=default,
> section=1], offset=231, length=0],offset=0,name=test.tsv,size=0] to output
> stream. Executable command cut ended in an error: cut: option
> '--output-delimiter' requires an argument
> Try 'cut --help' for more information.
> {code}
> This is due {{org.apache.nifi.processors.standard.util.ArgumentUtils}}:
> 1. Line 41: unwanted string trimming - imagine we have used {{' '}}
> (spacebar) as argument separator in previous example, then property would
> look like this: Command Arguments:
> {code}
> "-f 1,2,3,4 --output-delimiter "
> {code}
> (there is a space at the end of the string - the last separator as it was
> with semicolon). Then, trimming on this line, would ruin our last argument
> even before we come to splitting the argument string to list.
> 2. Line 52: if our output delimiter would look like {{" = "}} (space equals
> space), for example to create some kind of .ini file, this trimming would
> kill our attempts by providing the cut command only the {{"="}} as argument.
> 3. Line 53: if our attempt is to provide cut command with empty string as
> argument (to join columns), we are neglected by this line.
> There is a also JUnit test
> {{org.apache.nifi.processors.standard.TestExecuteProcess:testSplitArgs}}
> which just tests this wrong behavior.
> 4. Lines 69, 71- trimming once again.
> And as I am trying to fix this bug, I do see that there is also obscure QUOTE
> system, which, is not for quoting the delimiter character (which would
> otherwise be treated as a delimiter), but QUOTES are remove also when they do
> not enclose the delimiter. This quoting should be rethinked and documented.
> Lets fix at least this first bug reported here.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)