Github user revans2 commented on a diff in the pull request:
https://github.com/apache/storm/pull/877#discussion_r44786316
--- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
@@ -0,0 +1,30 @@
+ feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) |
JStorm-2.1.0 | Notes | JIRA |
+----------|-----------------------|--------------|-------|----|
+ Security | Basic Authentication/Authorization | N/A | The migration is
not high risk. But JStorm daeons/connections need to be evaluated
+Scheduler | Resource aware scheduling (Work In Progress) |
<ol><li>Make the tasks of same component distributed evenly across the nodes in
cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two
tasks which are transferring message directly into a same worker to reduce
network cost.<li>Support user defined assignment and to use the result of last
assignment Different solution between Storm and JStorm.</ol> Actually,
JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced
the problem of low utilization of cluster. For example, because the
configuration of topologys and resources of hardwares in a cluster are
different. After running a period, it is possible that no topology will be
assigned to the node with enough other resources but just lacking one resource.
| Scheduler interface is pluggable so we should be able to support boht
schedulers for the time being |
+Nimbus HA | Support to configure more than one spare nimbus. When the
master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly
matches the recoreds in zk) will be chosen to be promoted. | Different
solution. | |
+Topology structure | worker/executor/task | worker/task | It might be the
biggest diff between Storm and JStorm. So, not sure if there are some features
related to "executor" can not be merged. (Rebalancing allows us to combine
multiple tasks on a single executor) This may go away in the future once RAS
is stable |
+Topology Master | (Heartbeat server is similar for scaling on a pull
request) | New system bolt "topology master" was added, which is responsible
for collecting task heartbeat info of all tasks and reports to nimbus. Besides
task hb info, it also can be used to dispatch control messages in topology.
Topology master significantly reduce the amout of read/write to zk. Before
this change, zk is the bottleneck to depoly big cluster and topology.
+Backpressure | Store backpressure status in zk which will trigger the
source spout to start flow control. When flow control, relative souce spout
will stop sending tuples. | <ol><li>Implement backpressure based on topology
master. TM is responsible to process the trigger message and send flow control
request to relative spouts. When flow control, spout will not stop to send
tuples, but just slow down the sending.<li>Enable user to update the
configuration of backpressure dynamically without restarting topology, e.g.
enable/disable backpressue, high/low water mark... | These two solution is
totally different. We need to evaluate which one is better. |
+Monitor of task execute thread | N/A | Monitor the status of the execute
thread of task. It is effective to find the slow bolt in a topology, and
potential dead lock. |
+Message processing | Configurable multi receiving threads of worker (But
likely to change to have deserialization happen in Netty) | <ol><li>Add
receiving and transferring queue/thread for each task to make deserialization
and serialization asynchronously<li>Remove receiving and transferring thread on
worker level to avoid unnecessary locks and to shorten the message processing
phase | Low risk for merging to JStorm (Both implementations are similar) |
+Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple
to transfer queue and support to adjust the batch size dynamically according to
samples of actual batch size sent out for past intervals. | Should evaluate
both implementations, and see which is better, or if we can pull in some of the
dynamic batching to the disruptor level |
+Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist
grouping that tuples are sent to the tasks in the same worker firstly. If the
load of all local tasks is high, the tuples will be sent out to remote
tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the
tasks in the same worker or in same node. The shuffle of JStorm checks
the load of network connection of local worker. | The risk for merging of
checking remote load which was implemented in "Load aware balancing shuffle
grouping" is low.
+Rest API | YES | NA |The porting of rest API to JStorm is in progress
+web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and
clean.<li>Data display is much more beautiful and better user experience,
especially topology graph and 30 minutes metric trend. url:
http://storm.taobao.org/
+metric system | | <ol><li>All levels of metrics, including stream metrics,
task metrics, component metrics, topology metrics, even cluster metrics, will
be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very
useful for debug and finding hot spot of a topology.<li>Support full metrics
data. previous metric system can only display mean value of meters/histograms,
the new metric system can display m1, m5, m15 of meters, and common percentiles
of histograms.<li>Use new metrics windows, the min metric window is 1min, thus
we can see the metircs data every single minute.<li>Supplies a metric uploader
interface, thirty-party companies can easily build their own metric systems
based on all historic metric data. | JStorm metric system is completely
redesigned.
+jar command | Support progress bar when submitting topology | Does not
support progress bar | Low risk for merging to JStorm
+rebalance command | Basic functionality of rebalance | Besides rebalance,
scale-out/in by updating the number of worker, acker, spout & bolt dynamically
without stopping topology | How is grouping consistency maintined when changing
the number or bolt/spout? |
--- End diff --
OK we will need some really good documentation explaining how this works,
especially with state. Also if we ever want to go towards a flink or apex like
state check-pointing for exactly once processing it will likely not be
compatible with this feature, unless we allow everyone to checkpoint state in a
lazy key/value like way, which sounds a lot more complex. This is why I like
the task/executor split, because you can set the task level high while keeping
the executors low, and if you need to scale up the state does not need to be
split/combined.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---