On Wed, Aug 31, 2016 at 12:03 AM, Andy Zhou <az...@ovn.org> wrote: > > > On Tue, Aug 30, 2016 at 4:17 AM, Numan Siddique <nusid...@redhat.com> > wrote: > >> >> >> On Tue, Aug 30, 2016 at 1:11 AM, Andy Zhou <az...@ovn.org> wrote: >> >>> >>> >>> On Mon, Aug 29, 2016 at 3:14 AM, Numan Siddique <nusid...@redhat.com> >>> wrote: >>> >>>> >>>> >>>> On Sat, Aug 27, 2016 at 4:45 AM, Andy Zhou <az...@ovn.org> wrote: >>>> >>>>> Added the '--no-sync' option base on feedbacks of current >>>>> implementation. >>>>> >>>>> Added appctl command "ovsdb-server/sync-status" based on feedbacks >>>>> of current implementation. >>>>> >>>>> Added a test to simulate the integration of HA manager with OVSDB >>>>> server using replication. >>>>> >>>>> Other documentation and API improvements. >>>>> >>>>> Signed-off-by: Andy Zhou <az...@ovn.org> >>>>> ------ >>>>> >>>>> I hope to get some review comments on the command line and appctl >>>>> interfaces for replication. Since 2.6 is the first release of those >>>>> interfaces, it is easier to making changes, compare to future >>>>> releases. >>>>> >>>>> ---- >>>>> v1->v2: Fix creashes reported at: >>>>> http://openvswitch.org/pipermail/dev/2016-August/078591.html >>>>> --- >>>>> >>>> >>>> I haven't tested these patches yet. This patch seems to have a white >>>> space warning when applied. >>>> >>> Thanks for the reported. I will fold the fix in the next version when >>> posting. >>> >>> In case it helps, you can also access the patches from my private repo >>> at: >>> https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replic >>> ation-sm-v2 >>> >>> >> >> Hi Andy, >> >> I am seeing the below crash when >> >> - The ovsdb-server changes from >> master to standby and the active-ovsdb-server it is about to connect to >> is killed just before that or it is not reachable. >> >> - >> The pacemaker OCF script calls the sync-status cmd soon after that. >> >> >> Please let me know if you need more information. >> >> >> Core was generated by `ovsdb-server -vdbg >> --log-file=/opt/stack/logs/ovsdb-server-sb.log >> --remote=puni'. >> Program terminated with signal SIGSEGV, Segmentation fault. >> #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875 >> 875 SHASH_FOR_EACH (node, replication_dbs) { >> Missing separate debuginfos, use: dnf debuginfo-install >> glibc-2.23.1-10.fc24.x86_64 openssl-libs-1.0.2h-3.fc24.x86_64 >> (gdb) bt >> #0 0x000000000041241d in replication_status () at ovsdb/replication.c:875 >> #1 0x0000000000406eda in ovsdb_server_get_sync_status (conn=0x1421fd0, >> argc=<optimized out>, argv=<optimized out>, config_=<optimized out>) >> at ovsdb/ovsdb-server.c:1480 >> #2 0x00000000004324ee in process_command (request=0x1421f30, >> conn=0x1421fd0) at lib/unixctl.c:313 >> #3 run_connection (conn=0x1421fd0) at lib/unixctl.c:347 >> #4 unixctl_server_run (server=server@entry=0x141e140) at >> lib/unixctl.c:400 >> #5 0x0000000000405bdc in main_loop (is_backup=0x7fff08062256, >> exiting=0x7fff08062257, run_process=0x0, remotes=0x7fff080622a0, >> unixctl=0x141e140, >> all_dbs=0x7fff080622e0, jsonrpc=0x13f6f00) at ovsdb/ovsdb-server.c:182 >> #6 main (argc=<optimized out>, argv=<optimized out>) at >> ovsdb/ovsdb-server.c:430 >> >> Numan, thanks for the report. I think I spotted the bug: > > Currently, when replication state machine is reset, the state update > takes place after a round of main loop run. this time lag > could lead to the back trace in case the unixctl commands was issued > during this time lag. I have a fix that add another > state to represent the reset condition. The fix is at: > > https://github.com/azhou-nicira/ovs-review/tree/ovsdb-replication-sm-v3 > > Would you please let me know if this version works any better?. Thanks! >
Sure. I would test and let you know. Thanks Numan _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev