[ 
https://issues.apache.org/jira/browse/STORM-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

DashengJu updated STORM-738:
----------------------------
    Description: 
hi, all

we have a topology, which have 3 components(spout->parser->saver) and the 
parser is Multilang bolt with python. We do not use ACK mechanism.

we found 2 problems with Mutilang python script.
1) the parser python scripts may hold too many tuples and consume too many 
memory;
2) with MultiLang heartbeat mechanism described by  
https://issues.apache.org/jira/browse/STORM-513, the python script always 
timeout to heartbeat, even when the parser bolt is normal, cause supervisor to 
restart itself.

!storm_multilang.png!

ShellBolt process === Father-Process
PythonScript process === Child-Process

The reason is :
1) when topology do not use ACK mechanism, the spout do not have 
Overflow-control ability, if the stream have too many tuples comes,  spout will 
send all the tuples to parser's ShellBolt process(Father-Process);
2) parser's ShellBolt process just put the tuples to _pendingWrites queue, if 
the _pendingWrites queue does not have limit;
3) parser's PythonScript process(Child-Process) call readMsg() to read a tuple 
from STDIN, handle the tuple, and emit a new tuple to its father process 
through STDOUT, and then call readTaskIds() from STDIN.  Because 
Father-Process's queue already have too many other tuples, Child-Process will 
read all the tuples to pending_commands, util received TaskIds.
4) so Child-Process process's pending_commands may contains too many tuples and 
consume too many memory.

As to heartbeat, because there are too many pending_commands need Child-Process 
to handle, and Child-Process's every emit operation will need more I/O read 
operations from STDIN. It may need 10 seconds to handle one tuple, and this 
will cause the heartbeat tuple not handle quickly, and timeout will happen.

Even if Father-Process's _pendingWrites have limits, for example 1000, 
Child-Process may needs 1000 x 1000 read operations then it can handle the 
heartbeat tuple.

[~revans2] [~kabhwan] this related to Multilang and heartbeat, please help to 
confirm the two problems.


  was:
hi, all

we have a topology, which have 3 components(spout->parser->saver) and the 
parser is Multilang bolt with python. We do not use ACK mechanism.

we found 2 problems with Mutilang python script.
1) the parser python scripts may hold too many tuples and consume too many 
memory;
2) with MultiLang heartbeat mechanism described by  
https://issues.apache.org/jira/browse/STORM-513, the python script always 
timeout to heartbeat, even when the parser bolt is normal, cause supervisor to 
restart itself.

!http://yun.baidu.com/share/link?shareid=3956686758&uk=1124463074!

ShellBolt process === Father-Process
PythonScript process === Child-Process

The reason is :
1) when topology do not use ACK mechanism, the spout do not have 
Overflow-control ability, if the stream have too many tuples comes,  spout will 
send all the tuples to parser's ShellBolt process(Father-Process);
2) parser's ShellBolt process just put the tuples to _pendingWrites queue, if 
the _pendingWrites queue does not have limit;
3) parser's PythonScript process(Child-Process) call readMsg() to read a tuple 
from STDIN, handle the tuple, and emit a new tuple to its father process 
through STDOUT, and then call readTaskIds() from STDIN.  Because 
Father-Process's queue already have too many other tuples, Child-Process will 
read all the tuples to pending_commands, util received TaskIds.
4) so Child-Process process's pending_commands may contains too many tuples and 
consume too many memory.

As to heartbeat, because there are too many pending_commands need Child-Process 
to handle, and Child-Process's every emit operation will need more I/O read 
operations from STDIN. It may need 10 seconds to handle one tuple, and this 
will cause the heartbeat tuple not handle quickly, and timeout will happen.

Even if Father-Process's _pendingWrites have limits, for example 1000, 
Child-Process may needs 1000 x 1000 read operations then it can handle the 
heartbeat tuple.

[~revans2] [~kabhwan] this related to Multilang and heartbeat, please help to 
confirm the two problems.



> Multilang needs Overflow-Control and HeartBeat bug
> --------------------------------------------------
>
>                 Key: STORM-738
>                 URL: https://issues.apache.org/jira/browse/STORM-738
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 0.10.0, 0.9.3-rc2, 0.9.4, 0.11.0
>            Reporter: DashengJu
>            Priority: Critical
>         Attachments: storm_multilang.png
>
>
> hi, all
> we have a topology, which have 3 components(spout->parser->saver) and the 
> parser is Multilang bolt with python. We do not use ACK mechanism.
> we found 2 problems with Mutilang python script.
> 1) the parser python scripts may hold too many tuples and consume too many 
> memory;
> 2) with MultiLang heartbeat mechanism described by  
> https://issues.apache.org/jira/browse/STORM-513, the python script always 
> timeout to heartbeat, even when the parser bolt is normal, cause supervisor 
> to restart itself.
> !storm_multilang.png!
> ShellBolt process === Father-Process
> PythonScript process === Child-Process
> The reason is :
> 1) when topology do not use ACK mechanism, the spout do not have 
> Overflow-control ability, if the stream have too many tuples comes,  spout 
> will send all the tuples to parser's ShellBolt process(Father-Process);
> 2) parser's ShellBolt process just put the tuples to _pendingWrites queue, if 
> the _pendingWrites queue does not have limit;
> 3) parser's PythonScript process(Child-Process) call readMsg() to read a 
> tuple from STDIN, handle the tuple, and emit a new tuple to its father 
> process through STDOUT, and then call readTaskIds() from STDIN.  Because 
> Father-Process's queue already have too many other tuples, Child-Process will 
> read all the tuples to pending_commands, util received TaskIds.
> 4) so Child-Process process's pending_commands may contains too many tuples 
> and consume too many memory.
> As to heartbeat, because there are too many pending_commands need 
> Child-Process to handle, and Child-Process's every emit operation will need 
> more I/O read operations from STDIN. It may need 10 seconds to handle one 
> tuple, and this will cause the heartbeat tuple not handle quickly, and 
> timeout will happen.
> Even if Father-Process's _pendingWrites have limits, for example 1000, 
> Child-Process may needs 1000 x 1000 read operations then it can handle the 
> heartbeat tuple.
> [~revans2] [~kabhwan] this related to Multilang and heartbeat, please help to 
> confirm the two problems.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to