On Mon, Jan 11, 2016 at 3:59 PM, Michael <[email protected]> wrote:
> Of course, the main problem is the small amount of memory. Looking
> in the dmesg logs, I could spot regularly OOM messages, the kernel killing
> the backuppc_dump process, etc. Now it is a bit unfair of me blaming BPC
> when actually the main culprit is the lack of memory. But the thing is
> that BPC is quite unhelpful in these situations.
Not sure anything is helpful - or can be - in the OOM-killer
situation. The process in question doesn't get much of a chance to
tell you what happened.
> Server logs are mostly
> useless, no timestamps, and there is no attempt to restart again, but
> instead BPC goes into a long process of counting references, etc, meaning
> most of the server time is spent in (apparently) unproductive tasks.
On a suitable platform, Linux tends to be reliable so it's not
surprising that programmers don't spend a lot of time dealing with
cases that shouldn't happen.
> Initially BPC was a "mere" wrapper around rsync. First duplicating a
> complete hierarchy with hard-links, then rsync'ing over it. It had the
> advantage of simplicity but is very slow to maintain, and impossible to
> duplicate.
No, v3 has rsync completely implemented in perl so that it can
maintain the archive copies compressed while chatting block checksums
against a native remote rsync reading uncompressed files. And it
only handled the the older rsync protocol that required the entire
directory to be transferred first and held in RAM for the duration of
the run. Given the way perl stores variables, this isn't pretty, but
then again RAM is cheap.
> Now the trend in 4.0alpha was to move to a custom C
> implementation of rsync, where hierarchy only stores attrib files. I
> think that we can improve the maintenance phase further (ref counting,
> backup deletion...) by flattening this structure into a single linear
> file, and by listing once for all the references in a given backup,
> possibly with caching of references per directory. Directory entries
> would be more like git objects, attaching a name to a reference along
> with some metadata.
Git does some interesting things - but I'm not all that convinced
checking in a whole machine would be a win compared to what rsync
does.
> This means integrating further with the inner
> working of rsync. It would be fully compliant with rsync from the client
> side. But refcounting and backup deletion should then be equivalent
> to sorting and finding duplicate/unique entries, which can be very
> efficient. Even on my Lacie sorting a 600k-line file with 32B random
> hash entries takes only a couple seconds.
That kind of boils down to a question of how much work you want to do
to save a few dollars worth of RAM. Or even another box to do the
work for you over NFS or iscsi to your storage server.
> - Client-side sync
>
> Sure, this must be an optional feature, and I agree this is not the
> priority. Many clients will still simply run rsyncd or rsync/ssh. But
> the client-side sync would allow to detect hard links more efficiently.
> It will also decrease memory usage on the server (see rsync faq). Then
> it opens up a whole new set of optimization, delta-diff on multiple
> files...
I've always considered it one of the main attractions of BPC that it
does not require any client side setup beyond ssh keys which you
normally need anyway.
> *** Regarding writing in C
>
> Ok, I'm not a perl fan. But I agree, it is useful for stuff where
> performance does not matter, for website interface, etc. But I would
> rewrite in C the ref counting part and similar.
It's not 'performance' that is bad for many/most things where you are
dealing with network and disk activity. It just needs (much) more
RAM. And on most platforms that is easy to accommodate. That's not
to say it can't be improved, but you are going to trade expensive
human time to save a bit of cheap hardware.
--
Les Mikesell
[email protected]
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
BackupPC-devel mailing list
[email protected]
List: https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki: http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/