> > For, I think, efficiency reasons, > blobs/files which are under 512 bytes in size do not get stored in a > pack, so they stay forever in the loose blobserver.
I'm not sure I understand the efficiency argument? Certainly from a space efficiency perspective, it would be much preferably to store it in a pack --- otherwise we would be wasting at least 7.5k per small object. From a performance effiency perspective, it would depend on whether the small file was "hot" or not, I suppose. But I can't think of any reason why a small file would automatically be hotter than a cold file. And even if it is hot, we could cache it after the first access. Certainly if we are doing a backup, the file will be cold cold cold, and in the csae of a symlink tree, we could have a huge number of smallish blobs that would really do better packed into a packfile. Cheers, - Ted On Mon, Oct 3, 2016 at 10:52 AM, Mathieu Lonjaret < [email protected]> wrote: > Leaving the fsync question aside for now (I'd need to think about it > some more, and hoping Brad will reply in the meantime anyway), and > answering your question about why you don't straight pack all incoming > blobs: > packed blobs are supposed to help with sequential access of files, so > they're basically a re-assembling of all the small blobs of a file > into one (or more, if needed) zip. For, I think, efficiency reasons, > blobs/files which are under 512 bytes in size do not get stored in a > pack, so they stay forever in the loose blobserver. That is at least > one reason why you can't just stream all blobs directly to the packed > blobserver. > > On 2 October 2016 at 21:21, Theodore Ts'o <[email protected]> wrote: > > > > > > On Sunday, October 2, 2016 at 2:48:31 PM UTC-4, Theodore Ts'o wrote: > >> > >> (Especially since if you crash before the permanode is written, the > >> client is going to have to restart the whole backup from scratch > anyway.) > > > > > > One thought --- as an automated heuristic, if the blobserver receives a > > stream of unsigned blobs, it doesn't need to fsync() them. After all, > any > > objects which aren't referenced by a permanode are subject to GC > treatment. > > So if you crash and then run a GC, any immutable, non-signed objects that > > were uploaded just before the crash would be GC'ed anyway. Hence, > there's > > no point to treat them as precious objects that have to be fsync'ed > before > > the client upload is acknowledged. So what could be done is when the > first > > signed object is received, the blob server could send down a sync(2) > > command, and then write all of the signed objects using fsync(2). > > > > If we did this, the next obvious optimization would be to tune the > writeback > > interval for the disk in question to be 2-3 minutes, instead of the > usual 30 > > seconds. I noticed that objects were getting written as loose files, > and > > then repacked into pack file approximately every 2 minutes or so. All > > modern file systems do delayed allocation, which means that if we're not > > fsync'ing the loose files, they won't get flushed to disk, and so if they > > are written into the packed file and then get deleted within the > writeback > > interval, the loose files will never get written to disk. This will > > double camlistore's effective write throughput to the disk, since we > won't > > be writing each byte being backed up twice --- once to the loose file, > and a > > second time to the pack file. > > > > Cheers, > > > > - Ted > > > > P.S. I assume there are good reasons why we can't just stream the > objects > > straight to the pack file, which is what git does? I noticed there > were > > some comments about wanting to rearrange the objects so they would be in > an > > optimal order for later access. Is that right? > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Camlistore" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to [email protected]. > > For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Camlistore" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Camlistore" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
