There seems to be one more issue with dedup for data that is not 100%
dedupable. I tried with 50% and 80% and it give only 35 for 50 and 60
for 80.
# cat ddp_file.fio
[dedupe]
filename=test.tmp.comp
bs=256k
rw=write
size=10m
dedupe_percentage=80
write_iolog=test.tmp.log.comp
# fio ddp_file.fio
dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
ioengine=sync, iodepth=1
fio-2.2.7-26-g9451b
Starting 1 process
dedupe: Laying out IO file(s) (1 file(s) / 10MB)
dedupe: (groupid=0, jobs=1): err= 0: pid=13376: Tue Apr 28 02:54:02 2015
write: io=10240KB, bw=731429KB/s, iops=2857, runt= 14msec
clat (usec): min=170, max=374, avg=235.80, stdev=41.11
lat (usec): min=173, max=378, avg=239.10, stdev=41.75
clat percentiles (usec):
| 1.00th=[ 171], 5.00th=[ 175], 10.00th=[ 197], 20.00th=[ 213],
| 30.00th=[ 217], 40.00th=[ 221], 50.00th=[ 231], 60.00th=[ 235],
| 70.00th=[ 239], 80.00th=[ 253], 90.00th=[ 262], 95.00th=[ 318],
| 99.00th=[ 374], 99.50th=[ 374], 99.90th=[ 374], 99.95th=[ 374],
| 99.99th=[ 374]
lat (usec) : 250=77.50%, 500=22.50%
cpu : usr=57.14%, sys=28.57%, ctx=1, majf=0, minf=27
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued : total=r=0/w=40/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: io=10240KB, aggrb=731428KB/s, minb=731428KB/s,
maxb=731428KB/s, mint=14msec, maxt=14msec
Disk stats (read/write):
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
# fio/t/fio-dedupe -b 262144 test.tmp.comp
Will check <test.tmp.comp>, size <10485760>, using 8 threads
Threads(8): 40 items processed
Extents=40, Unique extents=15
De-dupe ratio: 1:1.67
Fio setting: dedupe_percentage=63
I also confirmed the same by taking checksum of the data file by
individual blocks of size bs.
# for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
skip=$each 2>/dev/null | hexdump -C | md5sum; done | wc -l
40 <<< have 40 blocks as expected.
# for each in {0..39}; do dd if=test.tmp.comp bs=262144 count=1
skip=$each 2>/dev/null | hexdump -C | md5sum; done | sort | uniq | wc
-l
16 <<< returns 16 unique blocks
In a 80% dedupable size, i would expect around 8 unique blocks. Is that true.?
Also, from the fio/t/fio-dedupe output, it shows that there are only
15 unique extents. Checking manually returns 16.
Thanks,
Srinivasa Chamarthy
Srinivasa R Chamarthy
On Tue, Apr 28, 2015 at 1:15 PM, Srinivasa Chamarthy
<[email protected]> wrote:
> Seems working now. Thanks for the great support.
>
> for each in {0..7}; do dd if=test.tmp.comp bs=262144 count=1
> skip=$each 2>/dev/null | hexdump -C | md5sum; done
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> e1d3c034e3fc15481e5c8610333ad9cd -
> Srinivasa R Chamarthy
>
>
> On Mon, Apr 27, 2015 at 10:39 PM, Jens Axboe <[email protected]> wrote:
>> On 04/27/2015 07:18 AM, Srinivasa Chamarthy wrote:
>>>
>>> I was just verifying if i could generate 100% duplicable data with
>>> FIO. I have configured small workload with bs of 256k and writing 2MB
>>> of file. I tried to get the checksum of each of 256k blocks of data
>>> from the file and the checksums do not match. If i am not wrong, when
>>> i specify data as 100% deduppable, my checksums should match isn't it?
>>>
>>> # cat ddp_file.fio
>>> [dedupe]
>>> filename=test.tmp
>>> bs=256k
>>> rw=write
>>> size=2m
>>> dedupe_percentage=100
>>> write_iolog=test.tmp.log
>>>
>>> # fio ddp_file.fio
>>> dedupe: (g=0): rw=write, bs=256K-256K/256K-256K/256K-256K,
>>> ioengine=sync, iodepth=1
>>> fio-2.2.7-24-g7c30
>>> Starting 1 process
>>> dedupe: Laying out IO file(s) (1 file(s) / 2MB)
>>>
>>> dedupe: (groupid=0, jobs=1): err= 0: pid=31497: Mon Apr 27 09:13:35 2015
>>> write: io=2048.0KB, bw=2000.0MB/s, iops=8000, runt= 1msec
>>> clat (usec): min=123, max=183, avg=150.50, stdev=22.35
>>> lat (usec): min=125, max=184, avg=152.38, stdev=22.08
>>> clat percentiles (usec):
>>> | 1.00th=[ 123], 5.00th=[ 123], 10.00th=[ 123], 20.00th=[
>>> 124],
>>> | 30.00th=[ 139], 40.00th=[ 145], 50.00th=[ 145], 60.00th=[
>>> 155],
>>> | 70.00th=[ 159], 80.00th=[ 177], 90.00th=[ 183], 95.00th=[
>>> 183],
>>> | 99.00th=[ 183], 99.50th=[ 183], 99.90th=[ 183], 99.95th=[
>>> 183],
>>> | 99.99th=[ 183]
>>> lat (usec) : 250=100.00%
>>> cpu : usr=0.00%, sys=0.00%, ctx=1, majf=0, minf=28
>>> IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> >=64=0.0%
>>> submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>> complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>> issued : total=r=0/w=8/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>>> latency : target=0, window=0, percentile=100.00%, depth=1
>>>
>>> Run status group 0 (all jobs):
>>> WRITE: io=2048KB, aggrb=2000.0MB/s, minb=2000.0MB/s,
>>> maxb=2000.0MB/s, mint=1msec, maxt=1msec
>>>
>>> Disk stats (read/write):
>>> sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
>>>
>>> # ls -lh test.tmp
>>> -rw-r--r-- 1 root root 2.0M Apr 27 09:13 test.tmp
>>>
>>> # cat test.tmp.log
>>> fio version 2 iolog
>>> test.tmp add
>>> test.tmp open
>>> test.tmp write 0 262144
>>> test.tmp write 262144 262144
>>> test.tmp write 524288 262144
>>> test.tmp write 786432 262144
>>> test.tmp write 1048576 262144
>>> test.tmp write 1310720 262144
>>> test.tmp write 1572864 262144
>>> test.tmp write 1835008 262144
>>> test.tmp close
>>>
>>> # for each in {0..7}; do dd if=test.tmp bs=262144 count=1 skip=$each
>>> 2>/dev/null | hexdump -C | md5sum; done
>>> 71a1660503bcff7c4e20a763d569d069 -
>>> 9c9bb7ec1020b4d4249028aecc896e6b -
>>> 68b9685812d47c822532854201c9b352 -
>>> e5c8ef471a27ba92b86893ee5ded654b -
>>> 14e0e798a8af3f4e6abdaf022ddf91c3 -
>>> 85528ae970bd25dde8c39ecaaffa4cf3 -
>>> 60b8ccf0e0793094b9356544fb541f3a -
>>> ef736cc9cbf7588cb7b84467cb37c44e -
>>>
>>> # fio -v
>>> fio-2.2.7-24-g7c30
>>
>>
>> Can you try with current -git? The corner cases of being 100% dedupable was
>> broken.
>>
>> --
>> Jens Axboe
>>
--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html