> > Isn't this only true if the file sizes are such that the concatenated 
> > blocks are perfectly aligned on the same zfs block boundaries they used 
> > before?  This seems unlikely to me.
> 
> Yes that would be the case.

While eagerly awaiting b128 to appear in IPS, I have been giving this issue 
(block size and alignment vs dedup) some thought recently.  I have a different, 
but sufficiently similar, scenario where the effectiveness of dedup will depend 
heavily on this factor.

For this case, though, the alignment question for short tails is relatively 
easily dealt with.  The key is that the record size of the file is "up to 128k" 
and may be shorter depending on various circumstances, such as the write 
pattern used.

To simplify, let us assume that the original files were all written quickly and 
sequentially, that is that they have n 128k blocks, plus a shorter tail.   When 
concatenating them, it should be sufficient to write out the target file in 
128k chunks from the source, then the first tail, then issue an fsync before 
moving on to the chunks from the second file.  

If the source files were not written in this pattern (e.g. log files, 
accumulating small varying-size writes), the best thing to do is to rewrite 
those "in place" as well, with the same pattern as being written to the joined 
file.  This can also have an improvement on compression efficiency, by allowing 
larger block sizes than the original.

Issues/questions:
 * This is an optimistic method of alignment, is there any mechanism to get 
stronger results - ie, to know the size of each record of the original, or to 
produce specific record size/alignment on output?
 * There's already the very useful seek interface for finding holes and data, 
perhaps something similar is useful here. Or a direct io related option to 
read, that can return short reads only up to the end of the current record?
 * Perhaps a pause of some kind (to wait for the txg to close) is also 
necessary, to ensure the tail doesn't get combined with new data and reblocked?
-- 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to