If concurrent access can be reproduced in a dev environment, it is possible
- with temporary code changes - to determine what two threads were
accessing the message at the same time.

An effective means to track this down is to record Exception objects (and
hence stack traces) at each point accessing the fields of concern.  Then,
when the problem is detected, dump the stack traces of those exceptions.

Also - on the "saving space" topic - is the text field actually storing a
copy of the bytes, or is it backed by the same memory used by the byte
array?  I'd be very curious to see that.  I'm also curious about the "isn't
tracked" assertion.

Hope this helps.

Art

On Thu, May 31, 2018 at 4:56 PM, Christopher Shannon <
christopher.l.shan...@gmail.com> wrote:

> Art,
>
> Concurrent store and dispatch should be off by default for topics (which I
> believe the test case is using).  Also, it wouldn't hit the store since
> it's a topic and not a durable subscription so should just be in memory.
> That's why this issue is weird as the messages shouldn't be concurrently
> touched by any thread since they get copied.  (but there still seems to be
> something going on that i am missing off the top of my head involving
> multiple threads touching the message)
>
> Clearing the text field is important because it prevents memory waste.  The
> explanation is explained more in https://issues.apache.org/
> jira/browse/AMQ-5857 If you don't clear it then extra memory is used that
> isn't tracked and can lead to OOM errors (it's why the
> reduceMemoryFootprintFlag exists in the broker).  So I think that change is
> probably ok but a side effect is causing this other issue to pop up.
>
> When I was doing some testing today, I tried a couple variations of
> synchronization (using volatile, adding mutexes in places, etc) and some
> worked and some didn't.  One thing that was interesting was that when I
> synchronized on the copy() method of ActiveMQText message the issue was
> fixed.  That is weird however because we shouldn't need to sync on copy.
> So either there is a problem with copy() itself or adding the
> synchronization is throwing off the timing just enough to hide the real
> issue.
>
> Ultimately I think adding synchronization in a test (like what you did)
> will be useful to track down the problem since I think it's a race
> condition issue but as a permanently fix I would avoid it.  The reason is
> that it shouldn't be needed...the objects are designed to only be accessed
> by one thread at a time and copied if necessary which makes sense to me.
> However if adding synchronization does fix the issue then hopefully it will
> at least be a clue as to what the real issue is.
>
> Chris
>
> On Thu, May 31, 2018 at 5:41 PM, Arthur Naseef <a...@amlinv.com> wrote:
>
> > Ahh, I bet that is a result of using the VM transport.  The broker is
> > likely doing the "beforeMarshal()" call that clears the text just before
> > the client is calling "getText()" on the same message - or just before
> > "copy()" is called to make a copy of the message.
> >
> > Yeah, looking at the code carefully - the storeContent() method is
> handling
> > "text != null" with "content = null" by converting text to a byte
> sequence
> > (
> > org.apache.activemq.util.ByteSequence) and stuffing that into "content".
> >
> > If all this happens concurrent with a copy() or a call to getText(), then
> > it's entirely possible to get this race condition.  So, disabling
> > "concurrent store and dispatch" could solve this problem.  Also, using
> > critical sections in "storeContentAndClear()" as well as "getText()"
> might
> > eliminate the race condition.
> >
> > With all that said, I don't see the big win of clearing the text field in
> > storeContentAndClear().
> >
> > @Chris - can you try the patch here and see if it eliminates the problem?
> > https://gist.github.com/artnaseef/d518c0431a34cd3d49e8d6e045a719e2
> >
> > I may have missed some paths that access both "text" and "content", but I
> > think I got the important ones.  If that patch works, that increases
> > confidence that you found the right race condition.
> >
> > Art
> >
> >
> >
> > On Thu, May 31, 2018 at 12:55 PM, Christopher Shannon <
> > christopher.l.shan...@gmail.com> wrote:
> >
> > > Found the culprit: Seems to be related to
> > > https://issues.apache.org/jira/browse/AMQ-5857
> > >
> > > Specifically, this commit:
> > > https://git-wip-us.apache.org/repos/asf?p=activemq.git;a=
> > > blobdiff;f=activemq-client/src/main/java/org/apache/activemq/command/
> > > ActiveMQTextMessage.java;h=cca09be21bfb877213a4a506415929
> f1f81a1902;hp=
> > > 347db78da559bf57b049574c96cd3a8e08681dae;hb=84ec047;hpb=
> > > de86f473f776fe3a8ca32d7f929ecec7993c458b
> > >
> > > Changing the storeContentAndClear() call back to storeContent() fixes
> it,
> > > however it was changed for a reason to prevent memory waste so need to
> > see
> > > if/what a permanent fix is
> > >
> > >
> > > On Thu, May 31, 2018 at 3:54 PM, codingismy11to7 <
> ste...@codemettle.com>
> > > wrote:
> > >
> > > > Ok, thank you...my thought also was that performance impact of using
> > tcp
> > > > wouldn't be enough to hurt us in any substantial way, even though it
> > > > *feels*
> > > > really wasteful to be marshaling the message and using TCP when we're
> > in
> > > > the
> > > > same JVM.
> > > >
> > > > In response to your first message, I changed the transport from
> nio://
> > to
> > > > tcp:// on all the non-vm threads and the behavior still reproduces.
> > Using
> > > > Java and raw AMQ libs is a much tougher problem...which I can
> do...but
> > I
> > > > probably won't be able to get to that for a few days.
> > > >
> > > >
> > > >
> > > > --
> > > > Sent from: http://activemq.2283324.n4.nabble.com/ActiveMQ-Dev-
> > > > f2368404.html
> > > >
> > >
> >
>

Reply via email to