Hello, Thanks for the patch. Is it based on real issue, or just clear thought? I'd like to see a problematic case/description/log/etc. please.
Werner Almesberger wrote: > The is_tx flag did not protect adequately against races when switching > from receiving to transmitting. This patch removes the unnecessary > spinlock and enforces proper synchronization along the interrupt > processing chain (source, handler, and worker), before entering > transmit mode. > > The mechanism for allowing transmits to be interrupted had two flaws: > 1) it tried to return with the transmission potentially never > completed, thus risking a subsequent BUG_ON(lp->is_tx), and > 2) it attempted to leave BUSY_TX with STATE_RX_ON, which according to > the data sheet is not valid. > > This patch force the transceiver into PLL_ON/TX_ON state on > interruption and then has a common path for transiting to RX_ON and > clearing is_tx. > > Note that the synchronization overhead introduced by this patch > is about 100 us on a Ben NanoNote. > > Signed-off-by: Werner Almesberger <wer...@almesberger.net> > > --- > drivers/ieee802154/at86rf230.c | 32 ++++++++++---------------------- > 1 files changed, 10 insertions(+), 22 deletions(-) > > diff --git a/drivers/ieee802154/at86rf230.c b/drivers/ieee802154/at86rf230.c > index 62d6075..00d7a68 100644 > --- a/drivers/ieee802154/at86rf230.c > +++ b/drivers/ieee802154/at86rf230.c > @@ -58,8 +58,7 @@ struct at86rf230_local { > > struct ieee802154_dev *dev; > > - spinlock_t lock; > - unsigned is_tx:1; /* P: lock */ > + volatile unsigned is_tx:1; /* P: lock */ Hmmm. I don't like volatile. Please drop it. Don't use it in kernel other than for register access or unless you are dumb sure in what you are doing. Generally if you want to use volatile, it means you try to avoid some extra synchronisation. Also the comment about "lock" is wrong in your code. If you remove lock, it can't protect anything. > }; > > > @@ -336,22 +335,24 @@ at86rf230_xmit(struct ieee802154_dev *dev, struct > sk_buff *skb) > { > struct at86rf230_local *lp = dev->priv; > int rc; > - unsigned long flags; > > pr_debug("%s\n", __func__); > > might_sleep(); > > - spin_lock_irqsave(&lp->lock, flags); > BUG_ON(lp->is_tx); > - lp->is_tx = 1; > + > INIT_COMPLETION(lp->tx_complete); > - spin_unlock_irqrestore(&lp->lock, flags); > > rc = at86rf230_state(dev, STATE_FORCE_TX_ON, STATE_TX_ON); > if (rc) > goto err; > > + synchronize_irq(lp->spi->irq); > + flush_work(&lp->irqwork); > + > + lp->is_tx = 1; > + And what happens if we have an interrupt here? Why do you need this synchro/flush at all? > rc = at86rf230_write_fbuf(lp, skb->data, skb->len); > if (rc) > goto err_rx; > @@ -368,18 +369,12 @@ at86rf230_xmit(struct ieee802154_dev *dev, struct > sk_buff *skb) > > rc = wait_for_completion_interruptible(&lp->tx_complete); > if (rc < 0) > - goto err_rx; > - > - rc = at86rf230_state(dev, STATE_RX_ON, STATE_RX_ON); > - > - return rc; > + at86rf230_state(dev, STATE_FORCE_TX_ON, STATE_TX_ON); > > err_rx: > at86rf230_state(dev, STATE_RX_ON, STATE_RX_ON); IIUC you stay in BUSY_TX, until the end of frame. Then it automatically transfers to PLL_ON, from which one can change to RX_ON. The problem can come from interrupting wait_for_completion, but that's another case. It's useful if hardware misfunctions (e.g. interrupt lost), so that user can stop this piece of code, no really matter in which state the chip stays itself. Please justify this. > err: > - spin_lock_irqsave(&lp->lock, flags); > lp->is_tx = 0; > - spin_unlock_irqrestore(&lp->lock, flags); This should be somehow protected wrt. RX_ON changing. Imagine going to RX_ON, being preemted for a while, receiving another TRX_ON interrupt here, when we've already done with transmitting, but the is_tx is still set. is_tx should be cleared both in error path here and in IRQ handler. > return rc; > } > @@ -430,7 +425,6 @@ static void at86rf230_irqwork(struct work_struct *work) > container_of(work, struct at86rf230_local, irqwork); > u8 status = 0, val; > int rc; > - unsigned long flags; > > dev_dbg(&lp->spi->dev, "IRQ Worker\n"); > > @@ -446,15 +440,10 @@ static void at86rf230_irqwork(struct work_struct *work) > > if (status & IRQ_TRX_END) { > status &= ~IRQ_TRX_END; > - spin_lock_irqsave(&lp->lock, flags); > - if (lp->is_tx) { > - lp->is_tx = 0; > - spin_unlock_irqrestore(&lp->lock, flags); > + if (lp->is_tx) > complete(&lp->tx_complete); > - } else { > - spin_unlock_irqrestore(&lp->lock, flags); > + else > at86rf230_rx(lp); > - } > } > > } while (status != 0); > @@ -642,7 +631,6 @@ static int __devinit at86rf230_probe(struct spi_device > *spi) > > mutex_init(&lp->bmux); > INIT_WORK(&lp->irqwork, at86rf230_irqwork); > - spin_lock_init(&lp->lock); > init_completion(&lp->tx_complete); > > spi_set_drvdata(spi, lp); -- With best wishes Dmitry ------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ Linux-zigbee-devel mailing list Linux-zigbee-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/linux-zigbee-devel