I think somebody on the Dev team needs to look into this. The topology I wrote is on GitHub and should reproduce the problem consistently. This feature seems broken.
> On Nov 16, 2020, at 4:21 AM, Michael Giroux <michael_a_gir...@yahoo.com> > wrote: > > Hello Thomas, > > Thank you very much for the response. Disabling the feature allowed me to > move from 1.2.3 => 2.2.0. > > I understand the intent of the feature however in my case the end result was > one node getting all the load and eventual netty heap space exceptions. > Perhaps I should look at that... or perhaps I'll just leave the feature > disabled. > > Again - thanks for the reply - VERY helpful. > > > On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman > <tomred...@mchsi.com> wrote: > > > I have seen this same thing. I sent a query on this list, and. after some > time, got a response. The issue is reportedly the result of a new feature. I > would assume this feature is CLEARLY broken, as I had built a test topology > that was clearly compute-bound, not IO bound, and there were adjustments to > change ratio of compute/IO. I have not tested this fix, I am too close to > release, I just rolled back to version 1.2.3. > > From version 2.1.0 forward, not matter how I changed this compute/net IO > ratio, tuples were not distributed across nodes. Now, I could only reproduce > this with anchored (acked) tuples. But if you anchored tuples, you could > never span more than one node, which defeats the purpose of using Storm. > Following is that email from Kishor Patil identifying the issue (and a way to > disable that feature!!!): > >> From: Kishor Patil <kishorvpa...@apache.org <mailto:kishorvpa...@apache.org>> >> Subject: Re: Significant Bug >> Date: October 29, 2020 at 8:07:18 AM CDT >> To: <d...@storm.apache.org <mailto:d...@storm.apache.org>> >> Reply-To: d...@storm.apache.org <mailto:d...@storm.apache.org> >> >> Hello Thomas, >> >> Apologies for delay in responding here. I tested the topology code provided >> in storm-issue repo. >> *only one machine gets peggeg*: Although it appears, his is not a bug. This >> is related to Locality Awareness. Please refer to >> https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md >> <https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md> >> It appears spout to bolt ratio is 200, so if there are enough bolt's on >> single node to handle events generated by the spout, it won't send events >> out to another node unless it runs out of capacity on single node. If you do >> not like this and want to distribute events evenly, you can try disabling >> this feature. You can turn off LoadAwareShuffleGrouping by setting >> topology.disable.loadaware.messaging to true. >> -Kishor >> >> On 2020/10/28 15:21:54, "Thomas L. Redman" <tomred...@mchsi.com >> <mailto:tomred...@mchsi.com>> wrote: >>> What’s the word on this? I sent this out some time ago, including a GitHub >>> project that clearly demonstrates the brokenness, yet I have not heard a >>> word. Is there anybody supporting Storm? >>> >>>> On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <tomred...@mchsi.com >>>> <mailto:tomred...@mchsi.com>> wrote: >>>> >>>> I believe I have encountered a significant bug. It seems topologies >>>> employing anchored tuples do not distribute across multiple nodes, >>>> regardless of the computation demands of the bolts. It works fine on a >>>> single node, but when throwing multiple nodes into the mix, only one >>>> machine gets pegged. When we disable anchoring, it will distribute across >>>> all nodes just fine, pegging each machine appropriately. >>>> >>>> This bug manifests from version 2.1 forward. I first encountered this >>>> issue with my own production cluster on an app that does significant NLP >>>> computation across hundreds of millions of documents. This topology is >>>> fairly complex, so I developed a very simple exemplar that demonstrates >>>> the issue with only one spout and bolt. I pushed this demonstration up to >>>> github to provide the developers with a mechanism to easily isolate the >>>> bug, and maybe provide some workaround. I used gradle to build this simple >>>> topology and software and package the results. This code is well >>>> documented, so it should be fairly simple to reproduce the issue. I first >>>> encountered this issue on 3 32 core nodes, but when I started >>>> experimenting, I set up a test cluster with 8 cores, and then I increased >>>> each node to 16 cores, and plenty of memory in every case. >>>> >>>> The topology can be accessed from github at >>>> https://github.com/cowchipkid/storm-issue.git >>>> <https://github.com/cowchipkid/storm-issue.git> >>>> <https://github.com/cowchipkid/storm-issue.git >>>> <https://github.com/cowchipkid/storm-issue.git>>. Please feel free to >>>> respond to me directory if you have any questions that are beyond the >>>> scope of this mail list. >>> >>> > > > Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0 > again for my next release. > >> On Nov 13, 2020, at 12:53 PM, Michael Giroux <michael_a_gir...@yahoo.com >> <mailto:michael_a_gir...@yahoo.com>> wrote: >> >> Hello, all, >> >> I have a topology with 16 workers running across 4 nodes. This topology has >> a bolt "transform" with executors=1 producing a stream that is comsumed by a >> bolt "ontology" with executors=160. Everything is configured as >> shufflegrouping. >> >> With Storm 1.2.3 all of the "ontology" bolts get their fair share of tuples. >> When I run Storm 2.2.0 only the "ontology" bolts that are on the same node >> as the single "transform" bolt get tuples. >> >> Same cluster - same baseline code - only difference is binding in the new >> maven artifact. >> >> No errors in the logs. >> >> Any thoughts would be welcome. Thanks! >> >