Hello,
For some reason I never received the original email (very odd). I think
that Gary has done a very good job at responding. I'll give you my take
on this, but please excuse me if I duplicate what has already been said.
First on the SQL database, which is as has been pointed out not
stateless: I have never seen database communications drops reported as a
problem. For me it has never been a problem because I run the Director
and the database backend (Postgres for me) on the same machine, so it is
as far as I understand not using the communications lines.
Now on the fact that line drops cancel jobs: First Bacula was designed
with the concept that it would have a stable communications line as is
supposed to be provided by TCP/IP, which Bacula uses. This was a
correct design based on networks at the time, but on retrospect, I
should have included comm line restarts in the original design. In my
opinion, the real problem is that modern switches for all sorts of good
reasons do not really support the original design goals of TCP/IP.
That said, I have been aware of the problem that Alan brings up, and
Bacula does have the ability to restart jobs at the point where the job
failed under certain conditions such as a comm line drop. This feature
seems to be rarely used, but is quite effective in the case where one
has lots of comm line failures.
For some time, I have had in mind a project to make Bacula restart a
comm line connection after a drop, however, as Gary points out, this is
far from being a trivial project. Bacula Systems currently has a
project well along the way to implement this feature, and from what I
have heard, it is now in the testing phase and will probably be in the
next Bacula Enterprise release. When it will appear in the community
version is not clear.
Concerning priorities of projects: to the best of my knowledge no one
has submitted a bug report or a request for this feature other than Alan
who submitted a request for this feature some time ago in the Enterprise
version. For Bacula Systems, a lot of time and consideration is devoted
twice a year to examine new feature requests and decide which to
implement. Every six months key managers and an outside consultant are
requested to submit the most important feature requests. They are then
sorted by a number of conditions such as: difficulty of the project,
number of users impacted, overall need for Bacula, ... All that then
works down to a Roadmap for the next release (in roughly 6 months) and
the following release (in roughly 1 year). The six month roadmap will
then be approved by the company managers and reviewed at the bi-annual
company meeting. Generally the six month roadmaps do not change much
(sometimes a feature is dropped or added). The 1 year roadmap can
change as you might imagine.
Bottom line: this is a very complex "feature" request, but it is now
well along in development, and so will be available at some time in the
not so distant future.
Gary: thanks for your insights :-)
Alan: I am not responding to all your comments, but will say that I
believe that you have misunderstood certain things about Bacula Systems,
how they decided what is important, etc. One of the nice things about
open source, is that if you are unhappy with what it does, you have all
the source code, and you can either implement what you want, or hire
someone to do it. Having an Enterprises agreement does not necessarily
meant that any feature request will be immediately implemented -- have
you ever tried to get Microsoft to fix a bug or implement a new feature?
Best regards,
Kern
On 5/27/20 4:13 PM, Gary R. Schmidt wrote:
On 27/05/2020 23:17, Alan Brown wrote:
I've been running Bacula for ~15 years (community/enterprise) and have
identified a few areas which are in desperate of improvement:
For an "enterprise" grade backup system, it's amazingly fragile in a few
areas (particularly in actual Enterprise networks!)
Bacula DOES NOT LIKE and does not handle network interruptions _at all_
if backups are in progress. This _will_ cause backups to abort - and
these aborted backups are _not_ resumable
Similarly, if there's any kind of disruption between the director and
database, the only fix is to restart the director
What that means is that Bacula _cannot_ be used with a High Availability
database because network interruptions (when switching servers) are part
of the HA paradigm.
It also means that operators have to be _extremely_ careful about
allowing automated or other system upgrades
In days of multi-TB backup sets, this is turning into a showstopping
problem.
As we are an Enteprise customer this has been raised with Baculasystems
but been given _very_ low priority. I'd like to hear opinions from the
wider community on this
Opinion: I know bugs aren't sexy to work on but these need fixing, not
being brushed off. This is the difference between LAN-quality and actual
Enterprise grade software.
I do not consider these to be bugs - they aren't simple errors where
someone made a mistake or used the wrong sized variable - they require
a large amount of re-design and reimplementation of Bacula's
communication modules, and the scheduler, and no doubt other bits to
go away.
Bacula started life twenty years ago, and the environment has changed
since then, and, while Bacula has kept up with a some things, disk as
a target rather than tape, frex, something like re-startable jobs is,
as I have said, not just an extension or addition to what is there,
but a big change to a large part of Bacula.
And that's a massive risk, it's the sort of task I would be looking at
having a whole team work on, a couple of designers, six to ten
programmers, and a QA team with a nasty manager who was not restricted
from saying, "No!" when things don't work quite right.
And the mob above all have a *really* good understanding of how the
various bits of Bacula work, and interact, and are capable of and
allowed to replace ancient groaning bits of code with newer versions
that just aren't as wrong. (First task - rename all files so the
extensions represent the C++ code inside them, and for the really
cruddy^Wannoying stuff, G++.)
And, from the commercial stand-point, that the changes could be made
without interrupting the existing income stream.
Then there's the projected time-line before it could be released?
I don't want to think about that, Bacula is fragile as it is, ripping
it apart and stitching it back together would be a massive task!
And Bacula does not have that capability, not in the OSS space nor in
the Enterprise space.
All the above said, I think that re-startable jobs would be a great
enhancement for Bacula, but how often and for how long does it try by
default before giving up? :->
Cheers,
Gary B-)
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users