[
https://issues.apache.org/jira/browse/NIFI-1856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272684#comment-15272684
]
Alan Jackoway commented on NIFI-1856:
-------------------------------------
That isn't quite right. That data only prints once I kill the process which is
consistent with the theory that we need to consume standard error:
{noformat}
2016-05-05 13:29:16,690 ERROR [Timer-Driven Process Thread-1]
o.a.n.p.standard.ExecuteStreamCommand
ExecuteStreamCommand[id=406f8294-e9ad-42dc-8a32-0884311b559b] Transferring flow
file
StandardFlowFileRecord[uuid=5775a527-82ed-4abe-8c00-948ccc034640,claim=StandardContentClaim
[resourceClaim=StandardResourceClaim[id=1462469324777-2, container=default,
section=2], offset=0, length=0],offset=0,name=146481921779377,size=0] to output
stream. Executable command python ended in an error: ERROR 0
ERROR 1
...
{noformat}
> ExecuteStreamCommand Needs to Consume Standard Error
> ----------------------------------------------------
>
> Key: NIFI-1856
> URL: https://issues.apache.org/jira/browse/NIFI-1856
> Project: Apache NiFi
> Issue Type: Bug
> Reporter: Alan Jackoway
>
> I was using ExecuteStreamProcess to run certain hdfs commands that are tricky
> to write in nifi but easy in bash (e.g. {{hadoop fs -rm -r
> /data/*/2014/05/05}})
> However, my larger commands kept hanging even though when I run them from the
> command line they finish quickly.
> Based on
> http://www.javaworld.com/article/2071275/core-java/when-runtime-exec---won-t.html
> I believe that ExecuteStreamCommand and possibly other processors need to
> consume the standard error stream to prevent the processes from blocking when
> standard error gets filled.
> To reproduce. Create this as ~/write.py
> {code:python}
> import sys
> count = int(sys.argv[1])
> for x in range(count):
> sys.stderr.write("ERROR %d\n" % x)
> sys.stdout.write("OUTPUT %d\n" % x)
> {code}
> Create a flow that goes
> # GenerateFlowFile - 5 minutes schedule 0 bytes size
> # ExecuteStreamCommand - Command arguments /Users/alanj/write.py;100 Command
> Path python
> # PutFile - /tmp/write/
> routing output stream of ExecuteStreamCommand to PutFile
> When you turn everything on, you get 100 lines (not 200) of just the standard
> output in /tmp/write.
> Next, change the command arguments to /Users/alanj/write.py;100000 and turn
> everything on again. The command will hang.
> I believe that whenever you execute a process the way ExecuteStreamCommand is
> doing, you need to consume the standard error stream to keep it from
> blocking. This may also affect things like ExecuteProcess and ExecuteScript
> as well.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)