Dear Jean-Pierre, I am still testing the dedup plugin, I have issues with rdiff-backup that the checksum of the source and destination are not the same, but this can be an issue with rdiff-backup.
However I did get these messages, but I do not know if they are related to the dedup plugin? Mar 21 06:57:37 backup ntfs-3g[12620]: Record 1582022 has wrong SeqNo (159 <> 156) Mar 21 06:57:37 backup ntfs-3g[12620]: Could not decode the type of inode 1582022 Mar 21 06:57:37 backup ntfs-3g[12620]: Record 1545290 has wrong SeqNo (296 <> 295) Mar 21 06:57:37 backup ntfs-3g[12620]: Could not decode the type of inode 1545290 Kind regards, Jelle de Jong On 25/02/17 12:30, Jean-Pierre André wrote: > [ Repeating, forgot to cc to the list ] > > Jean-Pierre André wrote: >> Hi, >> >> There was a bug in the index location, which in bad conditions >> could lead to an endless loop in the indexed search. So I have >> fixed the bug, and protected against a corrupted index leading >> to a similar loop. >> >> With the posted data, I can access the first byte of 865,675 >> dummy files similar to yours... Of course they are not your >> actual files and there is still room for problems. >> >> Could you try : >> >> http://jp-andre.pagesperso-orange.fr/dedup122-beta.zip >> >> Regards >> >> Jean-Pierre >> >> Jelle de Jong wrote: >>> Hi Jean-Pierre, >>> >>> # output: md5sum *.gz >>> https://powermail.nu/nextcloud/index.php/s/jxler2rZOqBdpr2 >>> >>> 879499f9187b0f590ae92460f4949dfd stream.data.full.dir.tar.gz >>> a8fc902613486e332898f92aba26c61f reparse-tags.gz >>> >>> Kind regards, >>> >>> Jelle de Jong >>> >>> On 23/02/17 14:24, Jean-Pierre André wrote: >>>> Hi, >>>> >>>> Can you also post the md5 (or sha1, or ...) of the big >>>> file. The connection is frequently interrupted, and I >>>> cannot rely on the downloaded file without a check. >>>> >>>> Jean-Pierre >>>> >>>> Jelle de Jong wrote: >>>>> Hi Jean-Pierre, >>>>> >>>>> Thank you! >>>>> >>>>> The reparse-tags.gz file: >>>>> https://powermail.nu/nextcloud/index.php/s/fS6Y6bpzoMgPiZ0 >>>>> >>>>> Generated by running: getfattr -e hex -n system.ntfs_reparse_data -R >>>>> /mnt/sr7-sdb2/ 2> /dev/null | grep ntfs_reparse_data | gzip > >>>>> /root/reparse-tags.gz >>>>> >>>>> Kind regards, >>>>> >>>>> Jelle de Jong >>>>> >>>>> On 23/02/17 12:07, Jean-Pierre André wrote: >>>>>> Hi, >>>>>> >>>>>> Jelle de Jong wrote: >>>>>>> Dear Jean-Pierre, >>>>>>> >>>>>>> I thought version 1.2.1 of the plug-in was working so I took it >>>>>>> further >>>>>>> into production, but during backups with rdiff-backup and >>>>>>> guestmount it >>>>>>> created a 100% cpu load in qemu process that stayed there for days >>>>>>> until >>>>>>> I killed them, I tested this twice. So I went back to a >>>>>>> xpart/mount -t >>>>>>> ntfs command and found more "Bad stream for offset" and found that >>>>>>> the >>>>>>> /sbin/mount.ntfs-3g command was running at 100% cpu load and hanged >>>>>>> there. >>>>>> >>>>>> Too bad. >>>>>> >>>>>>> I have added the whole Stream directory here: (1.1GB) >>>>>>> https://powermail.nu/nextcloud/index.php/s/vbq85qZ2wcVYxrG >>>>>>> >>>>>>> Separate stream file: stream.data.full.000c0000.00020001.gz >>>>>>> https://powermail.nu/nextcloud/index.php/s/QinV51XE4jrAH7a >>>>>>> >>>>>>> All the commands I used: >>>>>>> http://paste.debian.net/plainh/c0ea5950 >>>>>>> >>>>>>> I do not know how to get the reparse tags of all the files, maybe >>>>>>> you >>>>>>> can help me how to get all the information you need. >>>>>> >>>>>> Just use option -R on the base directory : >>>>>> >>>>>> getfattr -e hex -n system.ntfs_reparse_data -R base-dir >>>>>> >>>>>> Notes : >>>>>> 1) files with no reparse tags (those which are not deduplicated) >>>>>> will throw an error >>>>>> 2) this will output the file names, which you might not want >>>>>> to disclose. Fortunately I do not need them for now. >>>>>> >>>>>> So you may append to the above command : >>>>>> >>>>>> 2> /dev/null | grep ntfs_reparse_data | gz > reparse-tags.gz >>>>>> >>>>>> With that, I will be able to build a configuration similar >>>>>> to yours... apart from the files themselves. >>>>>> >>>>>> Regards >>>>>> >>>>>> Jean-Pierre >>>>>> >>>>>>> >>>>>>> Thank you for your help! >>>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> Jelle de Jong >>>>>>> >>>>>>> On 14/02/17 15:55, Jean-Pierre André wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Jelle de Jong wrote: >>>>>>>>> Hi Jean-Pierre, >>>>>>>>> >>>>>>>>> If we have to switch to Windows 2012 and thereby having an >>>>>>>>> environment >>>>>>>>> similar to yours then we can switch to an other Windows version. >>>>>>>> >>>>>>>> I do not have any Windows Server, and my analysis >>>>>>>> and tests are based on an unofficial deduplication >>>>>>>> package which was adapted to Windows 10 Pro. >>>>>>>> >>>>>>>> A few months ago, following a bug report, I had to >>>>>>>> make changes for Windows Server 2012 which uses an >>>>>>>> older data format, and my only experience about this >>>>>>>> format is related to this report. So switching to >>>>>>>> Windows 2012 is not guaranteed to make debugging easier. >>>>>>>> >>>>>>>>> We are running out of disk space here so if switching Windows >>>>>>>>> versions >>>>>>>>> makes the process of having data deduplication working easer >>>>>>>>> then me >>>>>>>>> know. >>>>>>>> >>>>>>>> I have not yet analyzed your latest report, but it >>>>>>>> would probably be useful I build a full copy of >>>>>>>> non-user data from your partition : >>>>>>>> - the reparse tags of all your files, >>>>>>>> - all the "*.ccc" files in the Stream directory >>>>>>>> >>>>>>>> Do not do it now, I must first dig into the data you >>>>>>>> posted. >>>>>>>> >>>>>>>> Regards >>>>>>>> >>>>>>>> Jean-Pierre >>>>>>>> >>>>>>>> >>>>>>>>> Kind regards, >>>>>>>>> >>>>>>>>> Jelle de Jong >>>>>>>>> >>>>>>>>> On 09/02/17 13:46, Jelle de Jong wrote: >>>>>>>>>> Hi Jean-Pierre, >>>>>>>>>> >>>>>>>>>> In case you are wondering: >>>>>>>>>> >>>>>>>>>> I am using data deduplication in Windows 2016 for my test >>>>>>>>>> environment >>>>>>>>>> iso: >>>>>>>>>> SW_DVD9_Win_Svr_STD_Core_and_DataCtr_Core_2016_64Bit_English_-2_MLF_X21-22843.ISO >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Kind regards, >>>>>>>>>> >>>>>>>>>> Jelle de Jong >>>>>>>>>> >>>>>>>>>> On 09/02/17 11:41, Jean-Pierre André wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Jelle de Jong wrote: >>>>>>>>>>>> Hi Jean-Pierre, >>>>>>>>>>>> >>>>>>>>>>>> Thank you! >>>>>>>>>>>> >>>>>>>>>>>> The new plug-in seems to work for now, I am moving it into >>>>>>>>>>>> testing >>>>>>>>>>>> phase >>>>>>>>>>>> with-in our production back-up scripts. >>>>>>>>>>> >>>>>>>>>>> Please wait a few hours, I have found a bug which >>>>>>>>>>> I have fixed. I am currently inserting your data >>>>>>>>>>> into my test base in order to rerun all my tests. >>>>>>>>>>> >>>>>>>>>>>> Will you release the source code eventually, would like to >>>>>>>>>>>> write a >>>>>>>>>>>> blog >>>>>>>>>>>> post about how to add the support. >>>>>>>>>>> >>>>>>>>>>> What exactly do you mean ? If it is about how to >>>>>>>>>>> collect the data in a unsupported condition, it is >>>>>>>>>>> difficult, because unsupported generally means >>>>>>>>>>> unknown territory... >>>>>>>>>>> >>>>>>>>>>>> What do you think the changes are of the plug-in stop working >>>>>>>>>>>> again? >>>>>>>>>>> >>>>>>>>>>> (assuming a typo changes -> chances) >>>>>>>>>>> Your files were in a condition not met before : data >>>>>>>>>>> has been relocated according to a logic I do not fully >>>>>>>>>>> understand. Maybe this is an intermediate step in the >>>>>>>>>>> process of updating the files, anyway this can happen. >>>>>>>>>>> >>>>>>>>>>> The situation I am facing is that I have a single >>>>>>>>>>> example from which it is difficult to derive the rules. >>>>>>>>>>> So yes, the plugin may stop working again. >>>>>>>>>>> >>>>>>>>>>> Note : there are strict consistency checks in the plugin, >>>>>>>>>>> so it is unlikely you read invalid data. Moreover if >>>>>>>>>>> you only mount read-only you cannot damage the deduplicated >>>>>>>>>>> partition. >>>>>>>>>>> >>>>>>>>>>>> We do not have an automatic test running to verify the >>>>>>>>>>>> back-ups at >>>>>>>>>>>> this >>>>>>>>>>>> moment _yet_, so if the plug-in stops working, incremental >>>>>>>>>>>> file-based >>>>>>>>>>>> back-ups with empty files will slowly get in the back-ups this >>>>>>>>>>>> way :| >>>>>>>>>>> >>>>>>>>>>> Usually a deduplicated partition is only used for backups, >>>>>>>>>>> and reading from backups is only for recovering former >>>>>>>>>>> versions of files (on demand). >>>>>>>>>>> >>>>>>>>>>> If you access deduplicated files with no human control, >>>>>>>>>>> you have to insert your own checks in the process. I >>>>>>>>>>> would at least check whether the size of the recovered >>>>>>>>>>> file is the same as the deduplicated one (also grep for >>>>>>>>>>> messages in the syslog). >>>>>>>>>>> >>>>>>>>>>> Regards >>>>>>>>>>> >>>>>>>>>>> Jean-Pierre >>>>>>>>>>> >>>>>>>>>>>> Again thank you for all your help so far! >>>>>>>>>>>> >>>>>>>>>>>> Kind regards, >>>>>>>>>>>> >>>>>>>>>>>> Jelle de Jong >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 08/02/17 15:59, Jean-Pierre André wrote: >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Can you please make a try with : >>>>>>>>>>>>> http://jp-andre.pagesperso-orange.fr/dedup120-beta.zip >>>>>>>>>>>>> >>>>>>>>>>>>> This is experimental and based on assumptions which have >>>>>>>>>>>>> to be clarified, but it should work in your environment. >>>>>>>>>>>>> >>>>>>>>>>>>> Regards >>>>>>>>>>>>> >>>>>>>>>>>>> Jean-Pierre >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > > ------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot _______________________________________________ ntfs-3g-devel mailing list ntfs-3g-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel