On Mon, Jan 11, 2016 at 3:59 PM, Michael <[email protected]> wrote:

> Of course, the main problem  is the small amount of memory. Looking
> in the dmesg logs, I could spot regularly OOM messages, the kernel killing
> the backuppc_dump process, etc.  Now it is a bit unfair  of me blaming BPC
> when actually  the main culprit  is the lack of  memory. But the  thing is
> that BPC  is quite unhelpful in  these situations.

Not sure anything is helpful - or can be - in the OOM-killer
situation.  The process in question doesn't get much of a chance to
tell you what happened.

> Server logs  are mostly
> useless, no  timestamps, and  there is  no attempt  to restart  again, but
> instead BPC goes into a long  process of counting references, etc, meaning
> most  of the  server time  is  spent in  (apparently) unproductive  tasks.

On a suitable platform, Linux tends to be reliable so it's not
surprising that programmers don't spend a lot of time dealing with
cases that shouldn't happen.

>   Initially BPC  was a  "mere" wrapper around  rsync. First  duplicating a
>   complete hierarchy with  hard-links, then rsync'ing over it.  It had the
>   advantage of simplicity but is very  slow to maintain, and impossible to
>   duplicate.

No, v3 has rsync completely implemented in perl so that it can
maintain the archive copies compressed while chatting block checksums
against a native remote rsync reading uncompressed files.    And it
only handled the the older rsync protocol that required the entire
directory to be transferred first and held in RAM for the duration of
the run.   Given the way perl stores variables, this isn't pretty, but
then again RAM is cheap.

>  Now  the  trend in  4.0alpha  was  to  move  to a  custom  C
>   implementation of  rsync, where  hierarchy only  stores attrib  files. I
>   think that we  can improve the maintenance phase  further (ref counting,
>   backup deletion...)  by flattening this  structure into a  single linear
>   file, and  by listing  once for  all the references  in a  given backup,
>   possibly  with caching  of references  per directory.  Directory entries
>   would be  more like git objects,  attaching a name to  a reference along
>   with  some  metadata.

Git does some interesting things - but I'm not all that convinced
checking in a whole machine would be a win compared to what rsync
does.

>  This  means  integrating  further with  the  inner
>   working of rsync. It would be fully compliant with rsync from the client
>   side.  But refcounting  and backup  deletion should  then be  equivalent
>   to  sorting and  finding  duplicate/unique entries,  which  can be  very
>   efficient. Even  on my Lacie  sorting a  600k-line file with  32B random
>   hash entries takes only a couple seconds.

That kind of boils down to a question of how much work you want to do
to save a few dollars worth of RAM.  Or even another box to do the
work for you over NFS or iscsi to your storage server.

> - Client-side sync
>
>   Sure, this  must be  an optional feature,  and I agree  this is  not the
>   priority. Many  clients will still  simply run rsyncd or  rsync/ssh. But
>   the client-side sync would allow  to detect hard links more efficiently.
>   It will also  decrease memory usage on the server  (see rsync faq). Then
>   it opens  up a  whole new  set of  optimization, delta-diff  on multiple
>   files...

I've always considered it one of the main attractions of BPC that it
does not require any client side setup beyond ssh keys which you
normally need anyway.

> *** Regarding writing in C
>
> Ok,  I'm not  a  perl fan.  But  I agree,  it is  useful  for stuff  where
> performance  does not  matter, for  website  interface, etc.  But I  would
> rewrite in C the ref counting part and similar.

It's not 'performance' that is bad for many/most things where you are
dealing with network and disk activity.  It just needs (much) more
RAM.  And on most platforms that is easy to accommodate.    That's not
to say it can't be improved, but you are going to trade expensive
human time to save a bit of cheap hardware.

-- 
   Les Mikesell
     [email protected]

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
BackupPC-devel mailing list
[email protected]
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-devel
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to