Michal Klempa created NIFI-1562:
-----------------------------------

             Summary: ExecuteStreamCommand and ExecuteProcess do not support 
empty command line arguments
                 Key: NIFI-1562
                 URL: https://issues.apache.org/jira/browse/NIFI-1562
             Project: Apache NiFi
          Issue Type: Bug
          Components: Extensions
    Affects Versions: 0.4.1, 0.5.0
            Reporter: Michal Klempa


Argument splitting is cluttered with trimming the whitespaces around the whole 
argument list and also for each individual argument.
This causes wrong behavior when DataFlow Manager needs to put empty string as 
an argument for command using ExecuteStreamCommand and ExecuteProcess.

Lets start by what DataFlow Manager needs to achieve (steps to reproduce):
1. Create a file "test.tsv" with *TAB* separated content:
{code}
one     two     three
this    is      one     string
{code}
2. Put GetFile Prrocessor to obtain this file into DataFlow
3. Connect GetFile to ExecuteStreamCommand.
4. ExecuteStreamCommand configuration: 
 - Command Path: cut
 - Command Arguments: -f;1,2,3,4;--output-delimiter;
 - auto terminate: original
5. Put LogAttribute (Log Payload: true, autoterminate: success) and connect 
ExecuteStreamCommand to LogAttribute to see the output.
6. Run this Flow.

Expected output:
{code}
onetwothree
thisisonestring
{code}
As the --output-delimiter argument to cut command is empty string (notice the 
last semicolon in argument list), cut command effectively joins columns.
This output can be obtained by issuing this command from within bash:
{code}
$ cut -f 1,2,3,4  --output-delimiter '' test.csv
{code}
Those are apostrophes (to tell bash it is an empty argument).

Actual output:
ExecuteStreamCommand informs Bulletin of cut command error:
{code}
06:14:27 UTC
ERROR
fb12bb69-37e0-4e23-927c-a8aba40f360d

ExecuteStreamCommand[id=fb12bb69-37e0-4e23-927c-a8aba40f360d] Transferring flow 
file 
StandardFlowFileRecord[uuid=d94c9e62-1005-4a2d-815d-bdb4c02ebd85,claim=StandardContentClaim
 [resourceClaim=StandardResourceClaim[id=1456380578601-1, container=default, 
section=1], offset=231, length=0],offset=0,name=test.tsv,size=0] to output 
stream. Executable command cut ended in an error: cut: option 
'--output-delimiter' requires an argument
Try 'cut --help' for more information.
{code}

This is due org.apache.nifi.processors.standard.util.ArgumentUtils:
1. Line 41: unwanted string trimming - imagine we have used ' ' (spacebar) as 
argument separator in previous example, then property would look like this: 
Command Arguments:"-f 1,2,3,4 --output-delimiter " (there is a space at the end 
of the string - the last separator as it was with semicolon). Then, trimming on 
this line, would ruin our last argument even before we come to splitting the 
argument string to list.
2. Line 52: if our output delimiter would look like " = " (space equals space), 
for example to create some kind of .ini file, this trimming would kill our 
attempts by providing the cut command only the "=" as argument.
3. Line 53: if our attempt is to provide cut command with empty string as 
argument (to join columns), we are neglected by this line.
There is a also JUnit test 
org.apache.nifi.processors.standard.TestExecuteProcess:testSplitArgs which just 
tests this wrong behavior.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to