Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Thomas L. Redman Mon, 16 Nov 2020 07:16:53 -0800

I think somebody on the Dev team needs to look into this. The topology I wrote 
is on GitHub and should reproduce the problem consistently. This feature seems 
broken.


> On Nov 16, 2020, at 4:21 AM, Michael Giroux <michael_a_gir...@yahoo.com> 
> wrote:
> 
> Hello Thomas,
> 
> Thank you very much for the response.  Disabling the feature allowed me to 
> move from 1.2.3 => 2.2.0.  
> 
> I understand the intent of the feature however in my case the end result was 
> one node getting all the load and eventual netty heap space exceptions.  
> Perhaps I should look at that... or perhaps I'll just leave the feature 
> disabled.
> 
> Again - thanks for the reply - VERY helpful.
> 
> 
> On Saturday, November 14, 2020, 01:05:03 PM EST, Thomas L. Redman 
> <tomred...@mchsi.com> wrote:
> 
> 
> I have seen this same thing. I sent a query on this list, and. after some 
> time, got a response. The issue is reportedly the result of a new feature. I 
> would assume this feature is CLEARLY broken, as I had built a test topology 
> that was clearly compute-bound, not IO bound, and there were adjustments to 
> change ratio of compute/IO. I have not tested this fix, I am too close to 
> release, I just rolled back to version 1.2.3.
> 
>  From version 2.1.0 forward, not matter how I changed this compute/net IO 
> ratio, tuples were not distributed across nodes. Now, I could only reproduce 
> this with anchored (acked) tuples. But if you anchored tuples, you could 
> never span more than one node, which defeats the purpose of using Storm. 
> Following is that email from Kishor Patil identifying the issue (and a way to 
> disable that feature!!!):
> 
>> From: Kishor Patil <kishorvpa...@apache.org <mailto:kishorvpa...@apache.org>>
>> Subject: Re: Significant Bug
>> Date: October 29, 2020 at 8:07:18 AM CDT
>> To: <d...@storm.apache.org <mailto:d...@storm.apache.org>>
>> Reply-To: d...@storm.apache.org <mailto:d...@storm.apache.org>
>> 
>> Hello Thomas,
>> 
>> Apologies for delay in responding here. I tested the topology code provided 
>> in storm-issue repo. 
>> *only one machine gets peggeg*: Although it appears, his is not a bug. This 
>> is related to Locality Awareness. Please refer to 
>> https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md 
>> <https://github.com/apache/storm/blob/master/docs/LocalityAwareness.md>
>> It appears spout to bolt ratio is 200, so if there are enough bolt's on 
>> single node to handle events generated by the spout, it won't send events 
>> out to another node unless it runs out of capacity on single node. If you do 
>> not like this and want to distribute events evenly, you can try disabling 
>> this feature. You can turn off LoadAwareShuffleGrouping by setting 
>> topology.disable.loadaware.messaging to true.
>> -Kishor
>> 
>> On 2020/10/28 15:21:54, "Thomas L. Redman" <tomred...@mchsi.com 
>> <mailto:tomred...@mchsi.com>> wrote: 
>>> What’s the word on this? I sent this out some time ago, including a GitHub 
>>> project that clearly demonstrates the brokenness, yet I have not heard a 
>>> word. Is there anybody supporting Storm?
>>> 
>>>> On Sep 30, 2020, at 9:03 AM, Thomas L. Redman <tomred...@mchsi.com 
>>>> <mailto:tomred...@mchsi.com>> wrote:
>>>> 
>>>> I believe I have encountered a significant bug. It seems topologies 
>>>> employing anchored tuples do not distribute across multiple nodes, 
>>>> regardless of the computation demands of the bolts. It works fine on a 
>>>> single node, but when throwing multiple nodes into the mix, only one 
>>>> machine gets pegged. When we disable anchoring, it will distribute across 
>>>> all nodes just fine, pegging each machine appropriately.
>>>> 
>>>> This bug manifests from version 2.1 forward. I first encountered this 
>>>> issue with my own production cluster on an app that does significant NLP 
>>>> computation across hundreds of millions of documents. This topology is 
>>>> fairly complex, so I developed a very simple exemplar that demonstrates 
>>>> the issue with only one spout and bolt. I pushed this demonstration up to 
>>>> github to provide the developers with a mechanism to easily isolate the 
>>>> bug, and maybe provide some workaround. I used gradle to build this simple 
>>>> topology and software and package the results. This code is well 
>>>> documented, so it should be fairly simple to reproduce the issue. I first 
>>>> encountered this issue on 3 32 core nodes, but when I started 
>>>> experimenting, I set up a test cluster with 8 cores, and then I increased 
>>>> each node to 16 cores, and plenty of memory in every case.
>>>> 
>>>> The topology can be accessed from github at 
>>>> https://github.com/cowchipkid/storm-issue.git 
>>>> <https://github.com/cowchipkid/storm-issue.git> 
>>>> <https://github.com/cowchipkid/storm-issue.git 
>>>> <https://github.com/cowchipkid/storm-issue.git>>. Please feel free to 
>>>> respond to me directory if you have any questions that are beyond the 
>>>> scope of this mail list.
>>> 
>>> 
> 
> 
> Hope this helps. Please let me know how this goes, I will upgrade to 2.2.0 
> again for my next release.
> 
>> On Nov 13, 2020, at 12:53 PM, Michael Giroux <michael_a_gir...@yahoo.com 
>> <mailto:michael_a_gir...@yahoo.com>> wrote:
>> 
>> Hello, all,
>> 
>> I have a topology with 16 workers running across 4 nodes.  This topology has 
>> a bolt "transform" with executors=1 producing a stream that is comsumed by a 
>> bolt "ontology" with executors=160.  Everything is configured as 
>> shufflegrouping.
>> 
>> With Storm 1.2.3 all of the "ontology" bolts get their fair share of tuples. 
>>  When I run Storm 2.2.0 only the "ontology" bolts that are on the same node 
>> as the single "transform" bolt get tuples.  
>> 
>> Same cluster - same baseline code - only difference is binding in the new 
>> maven artifact.
>> 
>> No errors in the logs.  
>> 
>> Any thoughts would be welcome.  Thanks!
>> 
>

Re: Problem moving topology from 1.2.3 to 2.2.0 - tuple distribution across cluster

Reply via email to