I'd like to share some thoughts on streaming:
1. Flow control.
Currently, storm is using max.spout.pending for flow control. It has
two limitations, 1) unacked topology will easily goes out of memory. 2) For
large max.spout.pending, there will be large latency, but the throughput
will not increase.
I experimented a new flow control method with idea of sliding window.
For each edge(upstream task with downstream task) there will be a sliding
window control the message rate. The sliding is is controlled by ask and
ack, ask and ack will be sent once for every 100 user messages. The
experiment result can get a 2 million msg/s throughput (on 4 machine) with
avg latency of 30ms without OOM, while the previous benchmark for storm
0.92 can only get 0.7 million msg/s for acked topology, for unacked
topology, storm 0.92 will OOM. With this flow control method,
max.spout.pending will be used only for fault-torellance, it no longer
controls the flow speed.
2. Worker output transfer queue.
Currently there is a single worker transfer queue. Maybe it will be
better to have a dedicated queue per target worker, instead of sharing a
single queue? Because for single queue, there will be more thread
contentions, besides, there will be larger latency. Because the message
need to process the messge in queue one by one.
3. Acking tunning
Currently in storm, each spout message will generate a root id. And every
processing downstream will ack to that message id. The good point is that
if one message is lost, we only need to replay that message. However, we
have observed that the ack traffic is significant! I was thinking what is
the real pattern of message loss. For example, when there is network issue,
there usually will be multiple message get lost. If we are going to recover
a batch of messages, why not leting the spout message sharing a same root
id. For example, spout can generate a new root id every 10ms, all messages
in 10ms will anchor to same root id. On each worker, we can do local merge
of same root id before sending the message to acker. That should save a lot
of traffic.
4. message loss during load balancing.
when a task is migrated to a new worker. Will the messages targeted for
this task get lost during the transition? For example, task t locates on
worker A, then t migrate to worker B. In the topology, there will a time
window that some task will still send the message to worker A, and there
will be message loss. I am thinking, whether it is possible to do something
on A, that A will no longer drop the message(with task not resloved
locally), but relay that messages to B. So it will be smoother transition.
5. At least once delivery.
Without kafka, currently storm cannot do strict at least once delivery.
Because once the spout worker dies, all message is lost. If we allow to use
kafka, is it still necessary to cache a lot of message in spout memory? Why
not just replaying from kafka?
6. Trident and exactly once.
I think current trident checkpointing implementation is not fast enough. 1)
It will checkpoint on every batch id. 2) The checkpointing is coordianated
globally which may add latency. Whether we can make the checkpointing
interval tunnable? for example, we may checkpoint once every 10 batches?
About the global coordination of checkpointing, there was some idea to use
kafka seq id as vector clock(Hailstorm: Distributed Stream Processing with
Exactly Once Semantics). to avoid batching and coordianated checkpointing.
7. thread model of tasks?
Will make task runnable instead of thread help scaling up? Actor model use
the Runnable model, it can host millions of actor in single JVM.
8. Connect with spark.
It will be very interesting to see how to connect to spark, co-operate with
spark. That spark can directly manipulate the storm DAG, and read the data
generated by storm.
Sean