Hello,

For some reason I never received the original email (very odd). I think that Gary has done a very good job at responding.  I'll give you my take on this, but please excuse me if I duplicate what has already been said.

First on the SQL database, which is as has been pointed out not stateless: I have never seen database communications drops reported as a problem.  For me it has never been a problem because I run the Director and the database backend (Postgres for me) on the same machine, so it is as far as I understand not using the communications lines.

Now on the fact that line drops cancel jobs: First Bacula was designed with the concept that it would have a stable communications line as is supposed to be provided by TCP/IP, which Bacula uses.  This was a correct design based on networks at the time, but on retrospect, I should have included comm line restarts in the original design.  In my opinion, the real problem is that modern switches for all sorts of good reasons do not really support the original design goals of TCP/IP.

That said, I have been aware of the problem that Alan brings up, and Bacula does have the ability to restart jobs at the point where the job failed under certain conditions such as a comm line drop.  This feature seems to be rarely used, but is quite effective in the case where one has lots of comm line failures.

For some time, I have had in mind a project to make Bacula restart a comm line connection after a drop, however, as Gary points out, this is far from being a trivial project.  Bacula Systems currently has a project well along the way to implement this feature, and from what I have heard, it is now in the testing phase and will probably be in the next Bacula Enterprise release. When it will appear in the community version is not clear.

Concerning priorities of projects: to the best of my knowledge no one has submitted a bug report or a request for this feature other than Alan who submitted a request for this feature some time ago in the Enterprise version. For Bacula Systems, a lot of time and consideration is devoted twice a year to examine new feature requests and decide which to implement.  Every six months key managers and an outside consultant are requested to submit the most important feature requests.  They are then sorted by a number of conditions such as: difficulty of the project, number of users impacted, overall need for Bacula, ...  All that then works down to a Roadmap for the next release (in roughly 6 months) and the following release (in roughly 1 year).  The six month roadmap will then be approved by the company managers and reviewed at the bi-annual company meeting.  Generally the six month roadmaps do not change much (sometimes a feature is dropped or added).  The 1 year roadmap can change as you might imagine.

Bottom line: this is a very complex "feature" request, but it is now well along in development, and so will be available at some time in the not so distant future.

Gary: thanks for your insights :-)

Alan: I am not responding to all your comments, but will say that I believe that you have misunderstood certain things about Bacula Systems, how they decided what is important, etc.  One of the nice things about open source, is that if you are unhappy with what it does, you have all the source code, and you can either implement what you want, or hire someone to do it.  Having an Enterprises agreement does not necessarily meant that any feature request will be immediately implemented -- have you ever tried to get Microsoft to fix a bug or implement a new feature?

Best regards,

Kern

On 5/27/20 4:13 PM, Gary R. Schmidt wrote:
On 27/05/2020 23:17, Alan Brown wrote:

I've been running Bacula for ~15 years (community/enterprise) and have
identified a few areas which are in desperate of improvement:

For an "enterprise" grade backup system, it's amazingly fragile in a few
areas (particularly in actual Enterprise networks!)


Bacula DOES NOT LIKE and does not handle network interruptions _at all_
if backups are in progress. This _will_ cause backups to abort - and
these aborted backups are _not_ resumable

Similarly, if there's any kind of disruption between the director and
database, the only fix is to restart the director


What that means is that Bacula _cannot_ be used with a High Availability
database because network interruptions (when switching servers) are part
of the HA paradigm.

It also means that operators have to be _extremely_ careful about
allowing automated or other system upgrades


In days of multi-TB backup sets, this is turning into a showstopping
problem.

As we are an Enteprise customer this has been raised with Baculasystems
but been given _very_ low priority.   I'd like to hear opinions from the
wider community on this


Opinion: I know bugs aren't sexy to work on but these need fixing, not
being brushed off. This is the difference between LAN-quality and actual
Enterprise grade software.

I do not consider these to be bugs - they aren't simple errors where someone made a mistake or used the wrong sized variable - they require a large amount of re-design and reimplementation of Bacula's communication modules, and the scheduler, and no doubt other bits to go away.

Bacula started life twenty years ago, and the environment has changed since then, and, while Bacula has kept up with a some things, disk as a target rather than tape, frex, something like re-startable jobs is, as I have said, not just an extension or addition to what is there, but a big change to a large part of Bacula.

And that's a massive risk, it's the sort of task I would be looking at having a whole team work on, a couple of designers, six to ten programmers, and a QA team with a nasty manager who was not restricted from saying, "No!" when things don't work quite right.

And the mob above all have a *really* good understanding of how the various bits of Bacula work, and interact, and are capable of and allowed to replace ancient groaning bits of code with newer versions that just aren't as wrong.  (First task - rename all files so the extensions represent the C++ code inside them, and for the really cruddy^Wannoying stuff, G++.)

And, from the commercial stand-point, that the changes could be made without interrupting the existing income stream.

Then there's the projected time-line before it could be released?
I don't want to think about that, Bacula is fragile as it is, ripping it apart and stitching it back together would be a massive task!

And Bacula does not have that capability, not in the OSS space nor in the Enterprise space.

All the above said, I think that re-startable jobs would be a great enhancement for Bacula, but how often and for how long does it try by default before giving up?  :->

    Cheers,
        Gary    B-)


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to