> On June 27, 2014, 8:27 a.m., Gordon Sim wrote:
> > The client shouldn't have to do a full execution-sync. It should be
> > sufficient to set the sync flag on the method. Skimming the broker code, I
> > don't see where async handling for a commit would happen. It looks to me
> > like perhaps the broker is incorrectly relying on an execution-sync in this
> > case? Do you have a protocol traces comparing a java client (without this
> > fix) and a python or c++ client also doing a commit?
Ignore the above... I do now see the path for async completion of commit, and
on the face of it a sync flag on the tx-commit command should be sufficient.
Unless we have already done so, I would check from the protocol trace whether
the sync flag is indeed being set on the commit (without this patch) as that
will clarify which side is not behaving as expected.
However I agree that for a commit at least, there is no reason why the broker
shouldn't just send a completion regardless. I *think* that might be as simple
as overriding the requiresSync flag in the async context, e.g.:
Index: cpp/src/qpid/broker/SemanticState.cpp
===================================================================
--- cpp/src/qpid/broker/SemanticState.cpp (revision 1605116)
+++ cpp/src/qpid/broker/SemanticState.cpp (working copy)
@@ -199,6 +199,7 @@
AsyncCommandCallback callback(
session,
boost::bind(&TxBuffer::endCommit, txBuffer, store));
+ callback.alwaysSendCompletion();
txBuffer->end(callback);
}
Index: cpp/src/qpid/broker/SessionState.h
===================================================================
--- cpp/src/qpid/broker/SessionState.h (revision 1605116)
+++ cpp/src/qpid/broker/SessionState.h (working copy)
@@ -260,6 +260,7 @@
requiresSync(ss.getCurrentCommand().isSyncRequired()),
completerContext(ss.getAsyncCommandCompleter())
{}
+ void alwaysSendCompletion() { requiresSync = true; }
virtual ~AsyncCommandContext() {}
- Gordon
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23096/#review46845
-----------------------------------------------------------
On June 26, 2014, 10:27 p.m., Alan Conway wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23096/
> -----------------------------------------------------------
>
> (Updated June 26, 2014, 10:27 p.m.)
>
>
> Review request for qpid, Gordon Sim, rajith attapattu, and Robbie Gemmell.
>
>
> Bugs: QPID-5855
> https://issues.apache.org/jira/browse/QPID-5855
>
>
> Repository: qpid
>
>
> Description
> -------
>
> Reproducer and tentative fix.
>
> To reproduce the bug, apply the changes to Spout and Drain but not the changes
> to Session. Run the script below. It should fail after several hundered
> messages.
>
> The problem appears to be that the java client sends a tx-commit and then
> times
> out in Session.sync() waiting for completion. When HA is enabled the tx-commit
> command is completed asynchronously, it seems like there is a race condition
> where
> the broker is not sending the completion.
>
> The fix on Session (part of this patch) is to force the client to always send
> an
> execution.sync if the command is a tx-commit. With that fix, the reproducer
> runs
> to completion after 100,000 messages.
>
> I'm not completely convinced that the fix is correct.
>
> - is it reasonable to special-case tx-commit like this? It is reasonable to
> say
> that you can't start a new transaction till you've finished the old one but
> there may be better places to enforce that. It does seem like the client
> is *trying* to sync. The stack trace has a lot of functions called sync(),
> but
> it seems none of them is actually sending a sync to the broker.
>
> - the tx-commit command *is* marked synchronous. Why isn't the client sending
> a
> sync (assuming it really isn't)? To me needSync = !m.isSync() suggests that
> the reason we don't send a sync here is because the method is marked
> synchronous and therefore we believe we've sent it somewhere else in the
> code.
>
> - If we can, it would be better to fix this on the broker rather than depend
> on
> the client getting it right. Why isn't the broker eventually sending a
> completion anyway, even without an explicit sync?
>
> ================ Reproducer shell script
> #!/bin/sh
> # Reproducer for Bug 1095849 - JAVA Client Can not recieve message with qpid
> ha cluster "Session exception occured while trying to commit"
>
> # Modify these variables to suit yourself:
> CHECKOUT=$HOME/qpid # qpid checkout root (contains qpid/cpp etc.)
> BUILD=$HOME/qpid/debug # C++ build
> QPIDD_DIR=/tmp
> QPIDD_CONF=$QPIDD_DIR/qpidd.conf
>
> START_QPIDD=1
> BUILD_JAVA=1
> while getopts "qj" opt; do
> case $opt in
> q) unset START_QPIDD ;;
> j) unset BUILD_JAVA ;;
> *) echo "Bad option: $opt. By default do everything. -q don't start
> qpidd, -j don't build java."; exit 1 ;;
> esac
> done
>
> source $BUILD/src/tests/test_env.sh
>
> wait_status() {
> until qpid-ha --config $QPIDD_CONF -b $1 status --expect $2 &> /dev/null;
> do
> echo wait_status $*
> sleep .5
> done
> }
>
> start_cluster() {
> while pkill qpidd; do
> echo "Killing old qpidd"
> done
>
> cat <<EOF > $QPIDD_CONF
> load-module=$BUILD/src/ha.so
> auth=no
> #log-enable=trace+:Protocol
> log-enable=debug+:HA
> ha-cluster=yes
> ha-replicate=all
> ha-brokers-url=localhost:6000,localhost:6001,localhost:6002
> EOF
> for p in 6000 6001 6002; do
> DIR=$QPIDD_DIR/qpidd-$p
> rm -rf $DIR
> mkdir -p $DIR
> qpidd -d --config=$QPIDD_CONF -p $p --log-to-file $DIR/log --data-dir
> $DIR
> done
> QPID_HA="qpid-ha --config $QPIDD_CONF -b localhost:6000"
> $QPID_HA promote
> wait_status localhost:6000 active
> wait_status localhost:6001 ready
> wait_status localhost:6002 ready
> $QPID_HA status --all
> }
>
> test "$START_QPIDD" && start_cluster
>
> pushd $CHECKOUT/qpid/java
>
> test "$BUILD_JAVA" && mvn package dependency:copy-dependencies -DskipTests
>
> EXAMPLE_TARGET=$CHECKOUT/qpid/java/client/example/target
> export CLASSPATH="$EXAMPLE_TARGET/classes/:$EXAMPLE_TARGET/dependency/*"
> java org.apache.qpid.example.Spout -b x:y@localhost:6000 -c 100000
> 'q;{create:always}'
>
> popd
>
>
> Diffs
> -----
>
>
> trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Drain.java
> 1605863
>
> trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Spout.java
> 1605863
> trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java
> 1605863
>
> Diff: https://reviews.apache.org/r/23096/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Alan Conway
>
>