----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/23096/#review46845 -----------------------------------------------------------
The client shouldn't have to do a full execution-sync. It should be sufficient to set the sync flag on the method. Skimming the broker code, I don't see where async handling for a commit would happen. It looks to me like perhaps the broker is incorrectly relying on an execution-sync in this case? Do you have a protocol traces comparing a java client (without this fix) and a python or c++ client also doing a commit? - Gordon Sim On June 26, 2014, 10:27 p.m., Alan Conway wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/23096/ > ----------------------------------------------------------- > > (Updated June 26, 2014, 10:27 p.m.) > > > Review request for qpid, Gordon Sim, rajith attapattu, and Robbie Gemmell. > > > Bugs: QPID-5855 > https://issues.apache.org/jira/browse/QPID-5855 > > > Repository: qpid > > > Description > ------- > > Reproducer and tentative fix. > > To reproduce the bug, apply the changes to Spout and Drain but not the changes > to Session. Run the script below. It should fail after several hundered > messages. > > The problem appears to be that the java client sends a tx-commit and then > times > out in Session.sync() waiting for completion. When HA is enabled the tx-commit > command is completed asynchronously, it seems like there is a race condition > where > the broker is not sending the completion. > > The fix on Session (part of this patch) is to force the client to always send > an > execution.sync if the command is a tx-commit. With that fix, the reproducer > runs > to completion after 100,000 messages. > > I'm not completely convinced that the fix is correct. > > - is it reasonable to special-case tx-commit like this? It is reasonable to > say > that you can't start a new transaction till you've finished the old one but > there may be better places to enforce that. It does seem like the client > is *trying* to sync. The stack trace has a lot of functions called sync(), > but > it seems none of them is actually sending a sync to the broker. > > - the tx-commit command *is* marked synchronous. Why isn't the client sending > a > sync (assuming it really isn't)? To me needSync = !m.isSync() suggests that > the reason we don't send a sync here is because the method is marked > synchronous and therefore we believe we've sent it somewhere else in the > code. > > - If we can, it would be better to fix this on the broker rather than depend > on > the client getting it right. Why isn't the broker eventually sending a > completion anyway, even without an explicit sync? > > ================ Reproducer shell script > #!/bin/sh > # Reproducer for Bug 1095849 - JAVA Client Can not recieve message with qpid > ha cluster "Session exception occured while trying to commit" > > # Modify these variables to suit yourself: > CHECKOUT=$HOME/qpid # qpid checkout root (contains qpid/cpp etc.) > BUILD=$HOME/qpid/debug # C++ build > QPIDD_DIR=/tmp > QPIDD_CONF=$QPIDD_DIR/qpidd.conf > > START_QPIDD=1 > BUILD_JAVA=1 > while getopts "qj" opt; do > case $opt in > q) unset START_QPIDD ;; > j) unset BUILD_JAVA ;; > *) echo "Bad option: $opt. By default do everything. -q don't start > qpidd, -j don't build java."; exit 1 ;; > esac > done > > source $BUILD/src/tests/test_env.sh > > wait_status() { > until qpid-ha --config $QPIDD_CONF -b $1 status --expect $2 &> /dev/null; > do > echo wait_status $* > sleep .5 > done > } > > start_cluster() { > while pkill qpidd; do > echo "Killing old qpidd" > done > > cat <<EOF > $QPIDD_CONF > load-module=$BUILD/src/ha.so > auth=no > #log-enable=trace+:Protocol > log-enable=debug+:HA > ha-cluster=yes > ha-replicate=all > ha-brokers-url=localhost:6000,localhost:6001,localhost:6002 > EOF > for p in 6000 6001 6002; do > DIR=$QPIDD_DIR/qpidd-$p > rm -rf $DIR > mkdir -p $DIR > qpidd -d --config=$QPIDD_CONF -p $p --log-to-file $DIR/log --data-dir > $DIR > done > QPID_HA="qpid-ha --config $QPIDD_CONF -b localhost:6000" > $QPID_HA promote > wait_status localhost:6000 active > wait_status localhost:6001 ready > wait_status localhost:6002 ready > $QPID_HA status --all > } > > test "$START_QPIDD" && start_cluster > > pushd $CHECKOUT/qpid/java > > test "$BUILD_JAVA" && mvn package dependency:copy-dependencies -DskipTests > > EXAMPLE_TARGET=$CHECKOUT/qpid/java/client/example/target > export CLASSPATH="$EXAMPLE_TARGET/classes/:$EXAMPLE_TARGET/dependency/*" > java org.apache.qpid.example.Spout -b x:y@localhost:6000 -c 100000 > 'q;{create:always}' > > popd > > > Diffs > ----- > > > trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Drain.java > 1605863 > > trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Spout.java > 1605863 > trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java > 1605863 > > Diff: https://reviews.apache.org/r/23096/diff/ > > > Testing > ------- > > > Thanks, > > Alan Conway > >
