All,
Currently Apex assumes that an operator can emit on any defined output
port and all streams defined by a DAG are active. I'd like to propose an
ability for an operator to open and close output ports. By default all
ports defined by an operator will be open. In the case an operator for
any reason decides that it will not emit tuples on the output port, it
may close it. This will make the stream inactive and the application
master may undeploy the downstream (for that input stream) operators. If
this leads to containers that don't have any active operators, those
containers may be undeployed as well leading to better cluster resource
utilization and better Apex elasticity. Later, the operator may be in a
state where it needs to emit tuples on the closed port. In this case, it
needs to re-open the port and wait till the stream becomes active again
before emitting tuples on that port. Making inactive stream active
again, requires the application master to re-allocate containers and
re-deploy the downstream operators.
It should be also possible for an application designer to mark streams
as inactive when an application starts. This will allow the application
master avoid reserving all containers when the application starts.
Later, the port can be open and inactive stream become active.
Thank you,
Vlad
- open/close ports and active/inactive streams Vlad Rozov
-