>
> For, I think, efficiency reasons,
> blobs/files which are under 512 bytes in size do not get stored in a
> pack, so they stay forever in the loose blobserver.


I'm not sure I understand the efficiency argument?   Certainly from a space
efficiency perspective, it would be much preferably to store it in a pack
--- otherwise we would be wasting at least 7.5k per small object.    From a
performance effiency perspective, it would depend on whether the small file
was "hot" or not, I suppose.   But I can't think of any reason why a small
file would automatically be hotter than a cold file.   And even if it is
hot, we could cache it after the first access.

Certainly if we are doing a backup, the file will be cold cold cold, and in
the csae of a symlink tree, we could have a huge number of smallish blobs
that would really do better packed into a packfile.

Cheers,

- Ted


On Mon, Oct 3, 2016 at 10:52 AM, Mathieu Lonjaret <
[email protected]> wrote:

> Leaving the fsync question aside for now (I'd need to think about it
> some more, and hoping Brad will reply in the meantime anyway), and
> answering your question about why you don't straight pack all incoming
> blobs:
> packed blobs are supposed to help with sequential access of files, so
> they're basically a re-assembling of all the small blobs of a file
> into one (or more, if needed) zip. For, I think, efficiency reasons,
> blobs/files which are under 512 bytes in size do not get stored in a
> pack, so they stay forever in the loose blobserver. That is at least
> one reason why you can't just stream all blobs directly to the packed
> blobserver.
>
> On 2 October 2016 at 21:21, Theodore Ts'o <[email protected]> wrote:
> >
> >
> > On Sunday, October 2, 2016 at 2:48:31 PM UTC-4, Theodore Ts'o wrote:
> >>
> >>  (Especially since if you crash before the permanode is written, the
> >> client is going to have to restart the whole backup from scratch
> anyway.)
> >
> >
> > One thought --- as an automated heuristic, if the blobserver receives a
> > stream of unsigned blobs, it doesn't need to fsync() them.    After all,
> any
> > objects which aren't referenced by a permanode are subject to GC
> treatment.
> > So if you crash and then run a GC, any immutable, non-signed objects that
> > were uploaded just before the crash would be GC'ed anyway.    Hence,
> there's
> > no point to treat them as precious objects that have to be fsync'ed
> before
> > the client upload is acknowledged.  So what could be done is when the
> first
> > signed object is received, the blob server could send down a sync(2)
> > command, and then write all of the signed objects using fsync(2).
> >
> > If we did this, the next obvious optimization would be to tune the
> writeback
> > interval for the disk in question to be 2-3 minutes, instead of the
> usual 30
> > seconds.   I noticed that objects were getting written as loose files,
> and
> > then repacked into pack file approximately every 2 minutes or so.    All
> > modern file systems do delayed allocation, which means that if we're not
> > fsync'ing the loose files, they won't get flushed to disk, and so if they
> > are written into the packed file and then get deleted within the
> writeback
> > interval, the loose files will never get written to disk.    This will
> > double camlistore's effective write throughput to the disk, since we
> won't
> > be writing each byte being backed up twice --- once to the loose file,
> and a
> > second time to the pack file.
> >
> > Cheers,
> >
> > - Ted
> >
> > P.S.  I assume there are good reasons why we can't just stream the
> objects
> > straight to the pack file, which is what git does?    I noticed there
> were
> > some comments about wanting to rearrange the objects so they would be in
> an
> > optimal order for later access.  Is that right?
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Camlistore" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to [email protected].
> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Camlistore" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Camlistore" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to