James Xu created STORM-157:
------------------------------

             Summary: ack/fail should not require keeping input values in scope.
                 Key: STORM-157
                 URL: https://issues.apache.org/jira/browse/STORM-157
             Project: Apache Storm (Incubating)
          Issue Type: Improvement
            Reporter: James Xu
            Priority: Minor


https://github.com/nathanmarz/storm/issues/752

ack/fail takes a Tuple, but it appears the values are not needed to ack. If we 
aggregate many things locally before we commit, we keep refs to many Tuples. We 
think this could be keeping more in memory than we need and pushing some 
topologies to the breaking point.

We are not 100% that this is the issue, but it would be good to have something 
like, getToken on Tuple. And ack/fail might take that Token which is an opaque 
object that includes the minimal refs to ack.

----------
nathanmarz: +1

This should be pretty easy to implement. Tuples already have a MessageID which 
is the primary object used for acking. There's also the "ack val" (the xors of 
tuples anchored onto this tuple) which should be moved into the MessageID. Then 
we can use the MessageID as the token, and update the APIs to accept the 
MessageID for acking and MessageID for anchoring as part of execute. 
IOutputCollector should be changed to only accept MessageID for 
acking/failing/anchoring, and then OutputCollector can add the convenience 
methods for accepting Tuple acking/anchoring/failing.

----------
jmlogan: We had this similar issue when we had very large tuples...at time time 
I "worked around it" by using Reflection to get ahold of the internal list 
storing the values, and clearing it.

I ended up mitigating this problem for good by having smaller tuples, and 
passing the large payloads through Redis.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Reply via email to