Re: [Bacula-users] areas for improvement?

Kern Sibbald Wed, 10 Jun 2020 05:09:00 -0700

Hello,

For some reason I never received the original email (very odd). I thinkthat Gary has done a very good job at responding. I'll give you my takeon this, but please excuse me if I duplicate what has already been said.

First on the SQL database, which is as has been pointed out notstateless: I have never seen database communications drops reported as aproblem. For me it has never been a problem because I run the Directorand the database backend (Postgres for me) on the same machine, so it isas far as I understand not using the communications lines.

Now on the fact that line drops cancel jobs: First Bacula was designedwith the concept that it would have a stable communications line as issupposed to be provided by TCP/IP, which Bacula uses. This was acorrect design based on networks at the time, but on retrospect, Ishould have included comm line restarts in the original design. In myopinion, the real problem is that modern switches for all sorts of goodreasons do not really support the original design goals of TCP/IP.

That said, I have been aware of the problem that Alan brings up, andBacula does have the ability to restart jobs at the point where the jobfailed under certain conditions such as a comm line drop. This featureseems to be rarely used, but is quite effective in the case where onehas lots of comm line failures.

For some time, I have had in mind a project to make Bacula restart acomm line connection after a drop, however, as Gary points out, this isfar from being a trivial project. Bacula Systems currently has aproject well along the way to implement this feature, and from what Ihave heard, it is now in the testing phase and will probably be in thenext Bacula Enterprise release. When it will appear in the communityversion is not clear.

Concerning priorities of projects: to the best of my knowledge no onehas submitted a bug report or a request for this feature other than Alanwho submitted a request for this feature some time ago in the Enterpriseversion. For Bacula Systems, a lot of time and consideration is devotedtwice a year to examine new feature requests and decide which toimplement. Every six months key managers and an outside consultant arerequested to submit the most important feature requests. They are thensorted by a number of conditions such as: difficulty of the project,number of users impacted, overall need for Bacula, ... All that thenworks down to a Roadmap for the next release (in roughly 6 months) andthe following release (in roughly 1 year). The six month roadmap willthen be approved by the company managers and reviewed at the bi-annualcompany meeting. Generally the six month roadmaps do not change much(sometimes a feature is dropped or added). The 1 year roadmap canchange as you might imagine.

Bottom line: this is a very complex "feature" request, but it is nowwell along in development, and so will be available at some time in thenot so distant future.


Gary: thanks for your insights :-)

Alan: I am not responding to all your comments, but will say that Ibelieve that you have misunderstood certain things about Bacula Systems,how they decided what is important, etc. One of the nice things aboutopen source, is that if you are unhappy with what it does, you have allthe source code, and you can either implement what you want, or hiresomeone to do it. Having an Enterprises agreement does not necessarilymeant that any feature request will be immediately implemented -- haveyou ever tried to get Microsoft to fix a bug or implement a new feature?


Best regards,

Kern

On 5/27/20 4:13 PM, Gary R. Schmidt wrote:

On 27/05/2020 23:17, Alan Brown wrote:
I've been running Bacula for ~15 years (community/enterprise) and have
identified a few areas which are in desperate of improvement:

For an "enterprise" grade backup system, it's amazingly fragile in a few
areas (particularly in actual Enterprise networks!)


Bacula DOES NOT LIKE and does not handle network interruptions _at all_
if backups are in progress. This _will_ cause backups to abort - and
these aborted backups are _not_ resumable

Similarly, if there's any kind of disruption between the director and
database, the only fix is to restart the director


What that means is that Bacula _cannot_ be used with a High Availability
database because network interruptions (when switching servers) are part
of the HA paradigm.

It also means that operators have to be _extremely_ careful about
allowing automated or other system upgrades


In days of multi-TB backup sets, this is turning into a showstopping
problem.

As we are an Enteprise customer this has been raised with Baculasystems
but been given _very_ low priority.   I'd like to hear opinions from the
wider community on this


Opinion: I know bugs aren't sexy to work on but these need fixing, not
being brushed off. This is the difference between LAN-quality and actual
Enterprise grade software.
I do not consider these to be bugs - they aren't simple errors wheresomeone made a mistake or used the wrong sized variable - they requirea large amount of re-design and reimplementation of Bacula'scommunication modules, and the scheduler, and no doubt other bits togo away.
Bacula started life twenty years ago, and the environment has changedsince then, and, while Bacula has kept up with a some things, disk asa target rather than tape, frex, something like re-startable jobs is,as I have said, not just an extension or addition to what is there,but a big change to a large part of Bacula.
And that's a massive risk, it's the sort of task I would be looking athaving a whole team work on, a couple of designers, six to tenprogrammers, and a QA team with a nasty manager who was not restrictedfrom saying, "No!" when things don't work quite right.
And the mob above all have a *really* good understanding of how thevarious bits of Bacula work, and interact, and are capable of andallowed to replace ancient groaning bits of code with newer versionsthat just aren't as wrong. (First task - rename all files so theextensions represent the C++ code inside them, and for the reallycruddy^Wannoying stuff, G++.)
And, from the commercial stand-point, that the changes could be madewithout interrupting the existing income stream.
Then there's the projected time-line before it could be released?
I don't want to think about that, Bacula is fragile as it is, rippingit apart and stitching it back together would be a massive task!
And Bacula does not have that capability, not in the OSS space nor inthe Enterprise space.
All the above said, I think that re-startable jobs would be a greatenhancement for Bacula, but how often and for how long does it try bydefault before giving up? :->
    Cheers,
        Gary    B-)


_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users



_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] areas for improvement?

Reply via email to