Hi Christopher, 

Thank you so much again :) 

Well you know.. I'm looking delta encoding really. But anyway, well for
that deduplication your are talking about... that should assume you are
able to run ZFS in the server we are backing up. Isn't it?. 

Cheers!! :)

El 2022-03-03 15:33, webmaster escribió:

> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche 
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y 
> sepa que el contenido es seguro.
> 
> Sorry might not have been obvious but what i was suggesting was a possible 
> way of getting the file daemon to do deduplication using virtual zfs 
> filesystems 
> 
> Christopher tyerman 
> 
> Sent from my Galaxy 
> 
> -------- Original message -------- 
> From: ego...@ramattack.net 
> Date: 03/03/2022 14:23 (GMT+00:00) 
> To: webmaster <webmas...@firebladeautomationsystems.co.uk> 
> Cc: Radosław Korzeniewski <rados...@korzeniewski.net>, 
> bacula-devel@lists.sourceforge.net 
> Subject: Re: [Bacula-devel] Open source Bacula plugin for deduplication 
> 
> Hi Christopher! 
> 
> Thanks a lot for your time!!. Answering below in blue for better discerning.
> 
> El 2022-03-03 15:03, webmaster escribió: 
> 
> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche 
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y 
> sepa que el contenido es seguro.
> 
> Hello 
> 
> I was reading this and had a thought about deduplication,  
> 
> I WANTED TO REFER TO DELTA ENCODING SORRY.... NOT BYTES DEDUP IN A STORAGE... 
> 
> The zfs filesytem has inbuilt deduplication (and compression) support 
> 
> so you could when creating a new backup volume  
> create a virtual zfs pool/filesystem  
> Write all backuped files to the zfs pool 
> Which automatically does deduplication  
> 
> WE RUN ZFS AS THE FILESYSTEM OF OUR FILE STORAGES... 
> 
> You then write the virtual zfs file system to your bacula volume 
> 
> Though Not sure how well this would work in practice, but seems like a 
> "simple" way to implement basic deduplication  
> 
> YES, ZFS IS NICE... BUT WE ARE LOOKING FOR TRANSFER AND STORE THE LESS 
> POSSIBLE INFO THAT A FD CAN SEND US.... 
> 
> Christopher tyerman 
> 
> CHEERS!!! 
> 
> Sent from my Galaxy 
> 
> -------- Original message -------- 
> From: egoitz--- via Bacula-devel <bacula-devel@lists.sourceforge.net> 
> Date: 03/03/2022 12:36 (GMT+00:00) 
> To: Radosław Korzeniewski <rados...@korzeniewski.net> 
> Cc: bacula-devel@lists.sourceforge.net 
> Subject: Re: [Bacula-devel] Open source Bacula plugin for deduplication 
> 
> Hello Radoslaw, 
> 
> I will answer below in green color for instance... just for discerning better 
> what both have spoke... :)
> 
> El 2022-03-03 12:46, Radosław Korzeniewski escribió: 
> 
> ATENCION: Este correo se ha enviado desde fuera de la organización. No pinche 
> en los enlaces ni abra los adjuntos a no ser que reconozca el remitente y 
> sepa que el contenido es seguro.
> 
> Hello, 
> 
> czw., 3 mar 2022 o 12:09 egoitz--- via Bacula-devel 
> <bacula-devel@lists.sourceforge.net> napisał(a): 
> 
> Good morning, 
> 
> I know Bacula enterprise provides deduplicacion plugins, but sadly we can't 
> afford it. No problem, we will try to create an open source deduplication 
> plugin for bacula file daemon. I would use rdiff (part of librsync) for delta 
> patching and signature generation. 
> What signatures rdiff is using?  
> 
> BASICALLY HERE IS DOCUMENTED EXACTLY... 
> HTTPS://LIBRSYNC.GITHUB.IO/PAGE_FORMATS.HTML 
> 
> IT'S FOR BEING ABLE TO GENERATE DELTA PATCHES, WITHOUT THE NEED OF HAVING OLD 
> AND NEW VERSION OF A FILE... AND SO... FOR AVOID DOUBLING THE SPACE USED OR 
> REQUIRED FOR BACKING UP... 
> 
> I would love to create a Bacula plugin for deduplicating content at fd level. 
> This way, even if the backup is sent crypted by fd to sd, the deduplication 
> could be done obtaining the best results as the deduplication takes place 
> when the files are not crypted yet. 
> Yes, for proper encryption you would always get different bits for the same 
> data block making deduplication totally useless. :)  
> 
> I THINK THAT TOO.. YES... 
> 
> The deduplication, would only be applied to files, let's say larger than 
> 10GB. 
> ??? 
> 
> I designed Bacula deduplication to handle blocks (files) larger than 1k 
> because indexing overhead for such small blocks was too high. The larger the 
> block you use the lower chance to get a good deduplication ratio. So it is a 
> trade-off - small blocks == good deduplication ratio but higher indexing 
> overhead; larger blocks == weak deduplication ratio but lower indexing 
> overhead. So it was handling block levels from 1K up to 64k (the default 
> bacula block size, but could be extended to any size). 
> 
> I UNDERSTAND WHAT YOU SAY BUT THE PROBLEM WE ARE FACING IS THE FOLLOWING ONE. 
> IMAGINE, A MACHINE WITH A SQL SERVER AND 150GB OF DATABASES. OUR PROBLEM IS 
> TO HAVE TO INCREMENTALLY COPY THAT EACH DAY. WE DON'T REALLY MIND COPYING 5GB 
> OF "WASTED" SPACE PER DAY... EVEN WHEN NON NECESSARY (JUST FOR 
> UNDERSTANDING).... BUT OBVIOUSLY 100GB PER DAY OR 200GB... ARE DIFFERENT 
> TERMS.... 
> 
> I WAS THINKING IN APPLYING THIS DEDUPLICATION ONLY FOR IMPORTANT FILES 
> REALLY.... HOPE YOU CAN UNDERSTAND ME NOW.. :) 
> 
> If you don't mind, I would like to share with you my ideas, in order to at 
> least know, "this all" is a possible way. 
> 
> My idea is basically : 
> 
> - WHEN DOING A BACKUP : 
> 
> ++ Check the backup level we are running. I suppose that asking bVarLevel to 
> getBaculaValue() 
> Deduplication should be totally transparent to the backup level. You want to 
> deduplicate data, especially for largest full level backups, right? 
> 
> WELL... REALLY... THE PROBLEM FOR US IS WHAT I TOLD JUST BEFORE SO... WE 
> DON'T REALLY MIND COPYING A BIG FILE ONCE A MONTH, BUT WE WANT TO AVOID 
> COPYING IT IN INCREMENTAL BACKUPS (AT LEAST THE WHOLE OF THE FILE...). APART, 
> WHEN RESTORING (AND NOT IN VIRTUAL BACKUPS), YOU RESTORE A FULL PLUS 
> INCREMENTALS. SO THIS WAY, WE WOULD RESTORE THE FULL ORIGINAL_FILE PLUS THE 
> PATCHES AND WE WOULD APPLY THEM TO ORIGINAL_FILE AT THE END OF THE RESTORING 
> JOB. 
> 
> ++ In startBackupFile() I suppose it gives me file size info (or if at least 
> gives me the name and I'll do an stat() in some manner), get the file size. 
> No. The standard "Bacula command Plugin API" expects that a plugin will 
> return a file stat info to backup.  
> 
> OK, NO PROBLEM... IF I GET IN SOME MANNER FILENAME AND PATH I COULD ALWAYS DO 
> A STAT() 
> 
> +++If it's a full level and bigger than 10GB, obtain the file signature and 
> finally store that new (previously non existing) signature (written in a file 
> with a known nomenclature based on ORIGINAL_FILE's name), plus the whole 
> ORIGINAL_FILE (the one we have generated the signature from) in Bacula tapes. 
> Should I need to say to Bacula, to re-read the directory for being able to 
> backup generated file signatures?. They weren't until know we have generated 
> a file that contains ORIGINAL_FILE signature. 
> Why do you call it a "deduplication plugin"? Above is a functionality 
> described by the Delta plugin which supports so-called "block level 
> incremental". Which is _NOT_ deduplication. This "block level incremental" 
> tries to backup blocks inside a single file which changed between backups. It 
> does not deduplicate the backup stream in any sense. For two identical files 
> which change in the same way Delta plugin will do data backup twice leaving 
> data duplication in place. 
> 
> YES MATE, YOU ARE RIGHT. WHAT I NEEDED IS TO AVOID UPLOADING TO BACKUP EACH 
> DAY BIG FILES WITH VERY LITTLE CHANGES. NOT TO AVOID WRITTING TWO EQUAL FILES 
> IN THE BACKUP. 
> 
> In the case of the Delta plugin which uses the exact procedure and library 
> which you describe above you should use an "Option Plugin API". 
> 
> I SEE. I'LL READ ABOUT IT... 
> 
> +++If it's an inc level and a previous signature of ORIGINAL_FILE file exists 
> (I would know because they will have a known nomenclature based on 
> ORIGINAL_FILE's name), with the previous signature plus the new state of the 
> file (the new file state I mean), create a patch. Later obtain again, the 
> file signature in the new status. Finally store that new signature plus the 
> patch in Bacula tapes. Finally return a bRC_Skip of the ORIGINAL_FILE 
> (because we are going to copy a delta patch and a signature). If I return a 
> bRC_Skip to here... would the fd, skip this file, but see the signatures and 
> delta patches generated before retuning the bRC_Skip?. Or should I ask to fd, 
> in some manner, to re-read the directory?. 
> It sounds like an exact step by step description of the Delta plugin. 
> 
> So, now I understand why you want to handle files > 10G only. :)  
> 
> THATS IT :) :) 
> 
> As you would assume in the incremental backups, I'm not storing the filename 
> as its in the filesystem. It should more or less the following way : 
> 
> In a full level backup : 
> 
> ++ BEFORE THE BACKUP  : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> ORIGINAL_FILE            <--->  
> 
> ++ AFTER THE BACKUP : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> ORIGINAL_FILE + SIGNATURE FILE           <--->  ORIGINAL FILE + SIGNATURE 
> FILE 
> 
> In the next incremental level backup : 
> 
> ++ BEFORE THE BACKUP  : 
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE GENERATED THE LAST FULL DAY  <--->  
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) 
> 
> ++ AFTER THE BACKUP :   
> 
> _BACKED SERVER'S FS <----> BACULA "VIRTUAL TAPE" CONTENT_ 
> 
> NEW_STATE_ORIGINAL_FILE + SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE <--->  
> _FROM THE FULL BACKUP_(ORIGINAL FILE + SIGNATURE FILE) + PATCH FILE + 
> SIGNATURE FILE OF NEW_STATE_ORIGINAL_FILE 
> 
> - WHEN RESTORING A BACKUP : 
> 
> If the restored files nomenclature is  (for example...) 
> ORIGINAL_FILE-SIGNATURE- OR ORIGINAL_FILE-PATCH that would mean (I assume I 
> could see in the filename to be restored in startRestoreFile() because it has 
> accesible the filename), we have backed up deltas of ORIGINAL_FILE in the 
> incremental backups. 
> 
> So, let's write to a plain text file with this path inside it, in order for 
> later, in a post restore job (or even bEventEndBackupJob event of the api?), 
> to apply the patches in that path, to the ORIGINAL_FILE obtainted from the 
> own name of the patch files. Finally after patching job done, remove 
> signature files and patch files. Obviously leaving the last status of 
> ORIGINAL_FILE at the restored date. 
> 
> So, at this point, I would be very very thankful :) :) :) if some experienced 
> developer, could give me some idea or if can see something is wrong or should 
> achieved in some other manner or with other plugin api functions..... 
> IMVHO, the Delta plugin should be best handled with "Options Plugin API" (as 
> it is with current Delta Plugin) and not the "Command Plugin API" as most of 
> the backup functionality will be provided by Bacula itself. 
> 
> I WILL READ ABOUT THIS TOO.... 
> 
> best regards 
> 
> BTW. I think a Delta plugin available in BEE is fairly cheap compared to full 
> deduplication options.  
> 
> I HAVE ASKED PRICE TO ROB MORRISON :) :) 
> 
> CHEERS!!! -- 
> Radosław Korzeniewski
> rados...@korzeniewski.net
_______________________________________________
Bacula-devel mailing list
Bacula-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to