Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Michael Giroux Mon, 16 Nov 2020 10:01:22 -0800

 Hello Thomas,
Thank you very much for the response.  Disabling the feature allowed me to move 
from 1.2.3 => 2.2.0.  
I understand the intent of the feature however in my case the end result was 
one node getting all the load and eventual netty heap space exceptions.  
Perhaps I should look at that... or perhaps I'll just leave the feature 
disabled.
Again - thanks for the reply - VERY helpful.


    On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman 
<[email protected]> wrote:  
 
 I have seen this same thing. I sent a query on this list, and. after some 
time, got a response. The issue is reportedly the result of a new feature. I 
would assume this feature is CLEARLY broken, as I had built a test topology 
that was clearly compute-bound, not IO bound, and there were adjustments to 
change ratio of compute/IO. I have not tested this fix, I am too close to 
release, I just rolled back to version 1.2.3.
 From version 2.1.0 forward, not matter how I changed this compute/net IO 
ratio, tuples were not distributed across nodes. Now, I could only reproduce 
this with anchored (acked) tuples. But if you anchored tuples, you could never 
span more than one node, which defeats the purpose of using Storm. Following is 
that email from Kishor Patil identifying the issue (and a way to disable that 
feature!!!):

From: Kishor Patil <[email protected]>
Subject: Re: Significant Bug
Date: October 29, 2020 at 8:07:18 AM CDT
To: <[email protected]>
Reply-To: [email protected]

Hello Thomas,

Apologies for delay in responding here. I tested the topology code provided in 
storm-issue repo. 
*only one machine gets peggeg*: Although it appears, his is not a bug. This is 
related to Locality Awareness. Please refer to 
https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md
It appears spout to bolt ratio is 200, so if there are enough bolt's on single 
node to handle events generated by the spout, it won't send events out to 
another node unless it runs out of capacity on single node. If you do not like 
this and want to distribute events evenly, you can try disabling this feature. 
You can turn off LoadAwareShuffleGrouping by setting 
topology.disable.loadaware.messaging to true.
-Kishor

On 2020/10/28 15:21:54, "Thomas L. Redman" <[email protected]> wrote: 

What’s the word on this? I sent this out some time ago, including a GitHub 
project that clearly demonstrates the brokenness, yet I have not heard a word. 
Is there anybody supporting Storm?


On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <[email protected]> wrote:

I believe I have encountered a significant bug. It seems topologies employing 
anchored tuples do not distribute across multiple nodes, regardless of the 
computation demands of the bolts. It works fine on a single node, but when 
throwing multiple nodes into the mix, only one machine gets pegged. When we 
disable anchoring, it will distribute across all nodes just fine, pegging each 
machine appropriately.

This bug manifests from version 2.1 forward. I first encountered this issue 
with my own production cluster on an app that does significant NLP computation 
across hundreds of millions of documents. This topology is fairly complex, so I 
developed a very simple exemplar that demonstrates the issue with only one 
spout and bolt. I pushed this demonstration up to github to provide the 
developers with a mechanism to easily isolate the bug, and maybe provide some 
workaround. I used gradle to build this simple topology and software and 
package the results. This code is well documented, so it should be fairly 
simple to reproduce the issue. I first encountered this issue on 3 32 core 
nodes, but when I started experimenting, I set up a test cluster with 8 cores, 
and then I increased each node to 16 cores, and plenty of memory in every case.

The topology can be accessed from github at 
https://github.com/cowchipkid/storm-issue.git 
<https://github.com/cowchipkid/storm-issue.git>. Please feel free to respond to 
me directory if you have any questions that are beyond the scope of this mail 
list.






Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0 
again for my next release.

On Nov 13, 2020, at 12:53 PM, Michael Giroux <[email protected]> wrote:
Hello, all,
I have a topology with 16 workers running across 4 nodes.  This topology has a 
bolt "transform" with executors=1 producing a stream that is comsumed by a bolt 
"ontology" with executors=160.  Everything is configured as shufflegrouping.
With Storm 1.2.3 all of the "ontology" bolts get their fair share of tuples.  
When I run Storm 2.2.0 only the "ontology" bolts that are on the same node as 
the single "transform" bolt get tuples.  
Same cluster - same baseline code - only difference is binding in the new maven 
artifact.
No errors in the logs.  
Any thoughts would be welcome.  Thanks!

Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Reply via email to