[GitHub] storm pull request: Added in feature diff

revans2 Fri, 13 Nov 2015 06:23:46 -0800

Github user revans2 commented on a diff in the pull request:

    https://github.com/apache/storm/pull/877#discussion_r44786316
  
    --- Diff: STORM_JSTORM_FEATURE_DIFF.md ---
    @@ -0,0 +1,30 @@
    + feature | Storm-0.11.0-SNAPSHOT (and expected pull requests) | 
JStorm-2.1.0 | Notes | JIRA |
    +----------|-----------------------|--------------|-------|----|
    + Security | Basic Authentication/Authorization | N/A | The migration is 
not high risk. But JStorm daeons/connections need to be evaluated
    +Scheduler | Resource aware scheduling      (Work In Progress) | 
<ol><li>Make the tasks of same component distributed evenly across the nodes in 
cluster.<li>Make number of tasks in a worker is balanced.<li>Try to assign two 
tasks which are transferring message directly into a same worker to reduce 
network cost.<li>Support user defined assignment and  to use the result of last 
assignment       Different solution between Storm and JStorm.</ol> Actually, 
JStorm has ever used cpu/mem/disk/network demond for scheduling. But we faced 
the problem of low utilization of cluster. For example, because the 
configuration of topologys and resources of hardwares in a cluster are 
different. After running a period, it is possible that no topology will be 
assigned to the node with enough other resources but just lacking one resource. 
| Scheduler interface is pluggable so we should be able to support boht 
schedulers for the time being |
    +Nimbus HA  | Support to configure more than one spare nimbus. When the 
master nimbus is down, a most appropriate spare nimbus(topologys in disk mostly 
matches the recoreds in zk) will be chosen to be promoted. | Different 
solution. | | 
    +Topology structure | worker/executor/task | worker/task | It might be the 
biggest diff between Storm and JStorm. So, not sure if there are some features 
related to "executor" can not be merged. (Rebalancing allows us to combine 
multiple tasks on a single executor)  This may go away in the future once RAS 
is stable |
    +Topology Master | (Heartbeat server is similar for scaling on a pull 
request) | New system bolt "topology master" was added, which is responsible 
for collecting task heartbeat info of all tasks and reports to nimbus. Besides 
task hb info, it also can be used to dispatch control messages in topology. 
Topology master significantly reduce the amout of read/write to zk.   Before 
this change, zk is the bottleneck to depoly big cluster and topology.
    +Backpressure | Store backpressure status in zk which will trigger the 
source spout to start flow control. When flow control, relative souce spout 
will stop sending tuples. | <ol><li>Implement backpressure based on topology 
master. TM is responsible to process the trigger message and send flow control 
request to relative spouts. When flow control, spout will not stop to send 
tuples, but just slow down the sending.<li>Enable user to update the 
configuration  of backpressure dynamically without restarting topology, e.g. 
enable/disable backpressue, high/low water mark... | These two solution is 
totally different. We need to evaluate which one is better. |
    +Monitor of task execute thread | N/A | Monitor the status of the execute 
thread of task. It is effective to find the slow bolt in a topology, and 
potential dead lock. |   
    +Message processing | Configurable multi receiving threads of worker (But 
likely to change to have deserialization happen in Netty) | <ol><li>Add 
receiving and transferring queue/thread for each task to make deserialization 
and serialization asynchronously<li>Remove receiving and transferring thread on 
worker level to avoid unnecessary locks and to shorten the message processing 
phase | Low risk for merging to JStorm (Both implementations are similar) |
    +Batch tuples | Batching in DisruptorQueue | Do batch before sending tuple 
to transfer queue and support to adjust the batch size dynamically according to 
samples of actual batch size sent out for past intervals. | Should evaluate 
both implementations, and see which is better, or if we can pull in some of the 
dynamic batching to the disruptor level |    
    +Grouping | Load aware balancing shuffle grouping | <ol><li>Add localfrist 
grouping that tuples are sent to the tasks in the same worker firstly. If the 
load of all local tasks is high, the tuples will be sent out to remote 
tasks.<li>Improve localOrShuffle grouping that tuples are only sent to the 
tasks in the same worker or in same node.        The shuffle of JStorm checks 
the load of network connection of local worker. | The risk for merging of  
checking remote load which was implemented in "Load aware balancing shuffle 
grouping" is low.
    +Rest API | YES | NA |The porting of rest API to JStorm is in progress
    +web-UI | | <ol><li>Redesign Web-UI, the new Web-UI code is clear and 
clean.<li>Data display is much more beautiful and better user experience, 
especially topology graph and 30 minutes metric trend. url: 
http://storm.taobao.org/        
    +metric system | | <ol><li>All levels of metrics, including stream metrics, 
task metrics, component metrics, topology metrics, even cluster metrics, will 
be sampled & calculated. Some metrics, e.g. ""tuple life cycle"", are very 
useful for debug and finding hot spot of a topology.<li>Support full metrics 
data. previous metric system can only display mean value of meters/histograms, 
the new metric system can display m1, m5, m15 of meters, and common percentiles 
of histograms.<li>Use new metrics windows, the min metric window is 1min, thus 
we can see the metircs data every single minute.<li>Supplies a metric uploader 
interface, thirty-party companies can easily build their own metric systems 
based on all historic metric data. | JStorm metric system is completely 
redesigned.
    +jar command | Support progress bar when submitting topology | Does not 
support progress bar | Low risk for merging to JStorm
    +rebalance  command | Basic functionality of rebalance | Besides rebalance, 
scale-out/in by updating the number of worker, acker, spout & bolt dynamically 
without stopping topology | How is grouping consistency maintined when changing 
the number or bolt/spout? | 
    --- End diff --
    
    OK we will need some really good documentation explaining how this works, 
especially with state.  Also if we ever want to go towards a flink or apex like 
state check-pointing for exactly once processing it will likely not be 
compatible with this feature, unless we allow everyone to checkpoint state in a 
lazy key/value like way, which sounds a lot more complex.  This is why I like 
the task/executor split, because you can set the task level high while keeping 
the executors low, and if you need to scale up the state does not need to be 
split/combined.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] storm pull request: Added in feature diff

Reply via email to