When I debug the crash, when oops occurs, the cstate of connection is
C_WF_REPORT_PARAMS but not C_TEAR_DOWN.

So this problem may also occurs up to 8.4.10 in may opinion.

the order of change state and init ack_sender in conn_connect function is:

```
    rv = conn_request_state(connection, NS(conn, C_WF_REPORT_PARAMS),
CS_VERBOSE);  <-- change cstate here
    if (rv < SS_SUCCESS || connection->cstate != C_WF_REPORT_PARAMS) {
        clear_bit(STATE_SENT, &connection->flags);
        return 0;
    }

    drbd_thread_start(&connection->ack_receiver);
    /* opencoded create_singlethread_workqueue(),
     * to be able to use format string arguments */
    connection->ack_sender
=                                                     <-- init ack_sender
here
#if LINUX_VERSION_CODE >= KERNEL_VERSION(3,3,0)
        alloc_ordered_workqueue("drbd_as_%s", WQ_MEM_RECLAIM,
connection->resource->name);
#else
        create_singlethread_workqueue("drbd_ack_sender");
#endif
    if (!connection->ack_sender) {
        drbd_err(connection, "Failed to create workqueue ack_sender\n");
        return 0;
    }

```

and the oops point valid ack_sender by cstate:

```
    if (connection->cstate >= C_WF_REPORT_PARAMS) {
        kref_get(&device->kref); /* put is in drbd_send_acks_wf() */
        if (!queue_work(connection->ack_sender,
&peer_device->send_acks_work))  <-- oops here.
            kref_put(&device->kref, drbd_destroy_device);
    }
```

2017-08-10 18:21 GMT+08:00 Lars Ellenberg <lars.ellenb...@linbit.com>:

> On Wed, Aug 09, 2017 at 05:20:22PM +0800, li songmin wrote:
> > Hi,
> >
> > when I upgrade fdrbd rom 8.3.15 to 8.4.6-5, there is an oops cause by
> NULL
> > pointer Error.
>
> We are at 8.4.10 already.
> Just saying.
>
> >
> > upgrade step as follow:
> >
> > 1.  primary node work as normal
> > 2. stop drbd 8.3.15 on secondary node, and upgrade it to 8.4.6-5.
> > 3. start secondary node, now data begin sync from primary node.
> > 4. upgrade primary node with follow step
> >      1. stop business service on drbd
> >       2. disconnect drbd for unmount quickly  <--  oops on secondary node
> > here?
>
> Why disconnect?
>
> >       3.  umount filesystem
> >       4. primary -> secondary
> >       5. connect drbd and waiting sync complete.
> >       6. business service may start on secondary node now.
> >       7. stop drbd 8.3.15 on primary node, and upgrade it to 8.4.6-5.
> >
> > call stack:
>
> > <4>[66071017.155051] Modules linked in: softdog drbd(FN)
>
> What did you need to force the module for?
> Probably *that* is your problem right there.
>
>
> --
> : Lars Ellenberg
> : LINBIT | Keeping the Digital World Running
> : DRBD -- Heartbeat -- Corosync -- Pacemaker
>
> DRBD® and LINBIT® are registered trademarks of LINBIT
> __
> please don't Cc me, but send to list -- I'm subscribed
> _______________________________________________
> drbd-user mailing list
> drbd-user@lists.linbit.com
> http://lists.linbit.com/mailman/listinfo/drbd-user
>
_______________________________________________
drbd-user mailing list
drbd-user@lists.linbit.com
http://lists.linbit.com/mailman/listinfo/drbd-user

Reply via email to