-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/23096/#review46847
-----------------------------------------------------------


The code you highlighted around 'needSync' and the whole 'sync method' piece is 
certainly not clearly named/structured, but it doesnt really seem to be doing 
the wrong thing to me. The tx-commit is marked sync, so the client presumably 
isn't issuing an explicit execution-sync command due to thinking that it has 
already done enough to get a response implicitly and doesn't need to repeat 
explicitly.

I had a look at the Java broker and the last thing it does when executing a 
command is check if it is marked sync, wait for it to complete any async work, 
and then send out a session-completed. Adding the explicit execution-sync for 
tx-commit to the client would result in it needing to do redundant work, so 
combined with the client seemingly not doing anything unreasonable I'd be 
inclined to pursue a fix in the C++ broker.



- Robbie Gemmell


On June 26, 2014, 10:27 p.m., Alan Conway wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/23096/
> -----------------------------------------------------------
> 
> (Updated June 26, 2014, 10:27 p.m.)
> 
> 
> Review request for qpid, Gordon Sim, rajith attapattu, and Robbie Gemmell.
> 
> 
> Bugs: QPID-5855
>     https://issues.apache.org/jira/browse/QPID-5855
> 
> 
> Repository: qpid
> 
> 
> Description
> -------
> 
> Reproducer and tentative fix.
> 
> To reproduce the bug, apply the changes to Spout and Drain but not the changes
> to Session.  Run the script below. It should fail after several hundered
> messages.
> 
> The problem appears to be that the java client sends a tx-commit and then 
> times
> out in Session.sync() waiting for completion. When HA is enabled the tx-commit
> command is completed asynchronously, it seems like there is a race condition 
> where
> the broker is not sending the completion.
> 
> The fix on Session (part of this patch) is to force the client to always send 
> an
> execution.sync if the command is a tx-commit. With that fix, the reproducer 
> runs
> to completion after 100,000 messages.
> 
> I'm not completely convinced that the fix is correct.
> 
> - is it reasonable to special-case tx-commit like this? It is reasonable to 
> say
>   that you can't start a new transaction till you've finished the old one but
>   there may be better places to enforce that. It does seem like the client
>   is *trying* to sync. The stack trace has a lot of functions called sync(), 
> but
>   it seems none of them is actually sending a sync to the broker.
> 
> - the tx-commit command *is* marked synchronous. Why isn't the client sending 
> a
>   sync (assuming it really isn't)? To me needSync = !m.isSync() suggests that
>   the reason we don't send a sync here is because the method is marked
>   synchronous and therefore we believe we've sent it somewhere else in the 
> code.
> 
> - If we can, it would be better to fix this on the broker rather than depend 
> on
>   the client getting it right.  Why isn't the broker eventually sending a
>   completion anyway, even without an explicit sync?
> 
> ================ Reproducer shell script
> #!/bin/sh
> # Reproducer for Bug 1095849 - JAVA Client Can not recieve message with qpid 
> ha cluster "Session exception occured while trying to commit"
> 
> # Modify these variables to suit yourself:
> CHECKOUT=$HOME/qpid           # qpid checkout root (contains qpid/cpp etc.)
> BUILD=$HOME/qpid/debug                # C++ build
> QPIDD_DIR=/tmp
> QPIDD_CONF=$QPIDD_DIR/qpidd.conf
> 
> START_QPIDD=1
> BUILD_JAVA=1
> while getopts "qj" opt; do
>     case $opt in
>       q) unset START_QPIDD ;;
>       j) unset BUILD_JAVA ;;
>       *) echo "Bad option: $opt. By default do everything. -q don't start 
> qpidd, -j  don't build java."; exit 1 ;;
>     esac
> done
> 
> source $BUILD/src/tests/test_env.sh
> 
> wait_status() {
>     until qpid-ha --config $QPIDD_CONF -b $1 status --expect $2 &> /dev/null; 
> do
>       echo wait_status $*
>       sleep .5
>     done
> }
> 
> start_cluster() {
>     while pkill qpidd; do
>       echo "Killing old qpidd"
>     done
> 
>     cat <<EOF  > $QPIDD_CONF
> load-module=$BUILD/src/ha.so
> auth=no
> #log-enable=trace+:Protocol
> log-enable=debug+:HA
> ha-cluster=yes
> ha-replicate=all
> ha-brokers-url=localhost:6000,localhost:6001,localhost:6002
> EOF
>     for p in 6000 6001 6002; do
>       DIR=$QPIDD_DIR/qpidd-$p
>       rm -rf $DIR
>       mkdir -p $DIR
>       qpidd -d --config=$QPIDD_CONF -p $p --log-to-file $DIR/log --data-dir 
> $DIR
>     done
>     QPID_HA="qpid-ha --config $QPIDD_CONF -b localhost:6000"
>     $QPID_HA promote
>     wait_status localhost:6000 active
>     wait_status localhost:6001 ready
>     wait_status localhost:6002 ready
>     $QPID_HA status --all
> }
> 
> test "$START_QPIDD" && start_cluster
> 
> pushd $CHECKOUT/qpid/java
> 
> test "$BUILD_JAVA" && mvn package dependency:copy-dependencies -DskipTests
> 
> EXAMPLE_TARGET=$CHECKOUT/qpid/java/client/example/target
> export CLASSPATH="$EXAMPLE_TARGET/classes/:$EXAMPLE_TARGET/dependency/*"
> java org.apache.qpid.example.Spout  -b x:y@localhost:6000 -c 100000 
> 'q;{create:always}'
> 
> popd
> 
> 
> Diffs
> -----
> 
>   
> trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Drain.java
>  1605863 
>   
> trunk/qpid/java/client/example/src/main/java/org/apache/qpid/example/Spout.java
>  1605863 
>   trunk/qpid/java/common/src/main/java/org/apache/qpid/transport/Session.java 
> 1605863 
> 
> Diff: https://reviews.apache.org/r/23096/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Alan Conway
> 
>

Reply via email to