On (09/09/13 16:21), Dan Carpenter wrote:
> On Mon, Sep 09, 2013 at 03:49:42PM +0300, Sergey Senozhatsky wrote:
> > > > Calling handle_pending_slot_free() for every RW operation may
> > > > cause unneccessary slot_free_lock locking, because most likely
> > > > process will see NULL slot_free_rq. handle_pending_slot_free()
> > > > only when current detects that slot_free_rq is not NULL.
> > > > 
> > > > v2: protect handle_pending_slot_free() with zram rw_lock.
> > > > 
> > > 
> > > zram->slot_free_lock protects zram->slot_free_rq but shouldn't the zram
> > > rw_lock be wrapped around the whole operation like the original code
> > > does?  I don't know the zram code, but the original looks like it makes
> > > sense but in this one it looks like the locks are duplicative.
> > > 
> > > Is the down_read() in the original code be changed to down_write()?
> > >
> > 
> > I'm not touching locking around existing READ/WRITE commands.
> > 
> 
> Your patch does change the locking because now instead of taking the
> zram lock once it takes it and then drops it and then retakes it.  This
> looks potentially racy to me but I don't know the code so I will defer
> to any zram maintainer.
>

it takes it only when there is zram->slot_free_rq. otherwise it just
pointer comparison. original code does (schematically for READ case)

down_read()
        spin_lock()
                if (!NULL) {...}
        spin_unlock();
up_read();


patch adds the `if (!NULL)' check before N concurrent readers will
stuck on spin_lock() to just unlock spin_lock because zram->slot_free_rq
is NULL.

> 1) You haven't given us any performance numbers so it's not clear if the
>    locking is even a problem.

good point, I did not perform any testing. here they are:

 ./iozone -t -T -R -l 3 -u 3 -r 16K -s 60M -I +Z

        Record Size 16 KB
        File size set to 61440 KB
        O_DIRECT feature enabled
        Command line used: ./iozone -t -T -R -l 3 -u 3 -r 16K -s 60M -I +Z
        Output is in Kbytes/sec
        Time Resolution = 0.000001 seconds.
        Processor cache size set to 1024 Kbytes.
        Processor cache line size set to 32 bytes.
        File stride size set to 17 * record size.
        Min process = 3 
        Max process = 3 
        Throughput test with 3 processes
        Each process writes a 61440 Kbyte file in 16 Kbyte records


test             |    w/o patch   |    w/ patch
--------------------------------------------------
           Read  |   1895278.31   |   2778439.78
        Re-read  |   2638231.06   |   2630383.88
   Reverse Read  |   1378348.80   |   1538697.89
    Stride read  |   1698457.96   |   2043402.42
    Random read  |   1818998.33   |   2038670.38
 Mixed workload  |    376585.98   |    435064.57
          Pread  |   1402478.04   |    992575.22
          Fread  |   4955286.31   |   5199061.31

> 2) The v2 patch introduces an obvious deadlock in zram_slot_free()
>    because now we take the rw_lock twice.  Fix your testing to catch
>    this kind of bug next time.

yes it is. and I'm sorry about that. I sent v3 yesterday 
https://lkml.org/lkml/2013/9/8/42

> 3) Explain why it is safe to test zram->slot_free_rq when we are not
>    holding the lock.  I think it is unsafe.  I don't want to even think
>    about it without the numbers.

atomic pointer test, which is either NULL or !NULL.

for NULL case we don't take spin lock and just skip the whole
handle_pending_slot_free() thing.

for !NULL we call handle_pending_slot_free(). any operations with 
zram->slot_free_rq
(walking, removal or addition of element) are protected by spin lock within
handle_pending_slot_free(). most requests will see NULL zram->slot_free_rq.

if someone set zram->slot_free_rq to !NULL right after current process checked
it, then next request will see it !NULL and perform handle_pending_slot_free().

        -ss

> regards,
> dan carpenter
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to