Re: [ntfs-3g-devel] ntfs-3g support for volumes with data deduplication windows 2012

Jelle de Jong Tue, 21 Mar 2017 02:33:15 -0700

Dear Jean-Pierre,

I am still testing the dedup plugin, I have issues with rdiff-backup 
that the checksum of the source and destination are not the same, but 
this can be an issue with rdiff-backup.


However I did get these messages, but I do not know if they are related 
to the dedup plugin?

Mar 21 06:57:37 backup ntfs-3g[12620]: Record 1582022 has wrong SeqNo 
(159 <> 156)
Mar 21 06:57:37 backup ntfs-3g[12620]: Could not decode the type of 
inode 1582022
Mar 21 06:57:37 backup ntfs-3g[12620]: Record 1545290 has wrong SeqNo 
(296 <> 295)
Mar 21 06:57:37 backup ntfs-3g[12620]: Could not decode the type of 
inode 1545290

Kind regards,

Jelle de Jong

On 25/02/17 12:30, Jean-Pierre André wrote:
> [ Repeating, forgot to cc to the list ]
>
> Jean-Pierre André wrote:
>> Hi,
>>
>> There was a bug in the index location, which in bad conditions
>> could lead to an endless loop in the indexed search. So I have
>> fixed the bug, and protected against a corrupted index leading
>> to a similar loop.
>>
>> With the posted data, I can access the first byte of 865,675
>> dummy files similar to yours... Of course they are not your
>> actual files and there is still room for problems.
>>
>> Could you try :
>>
>> http://jp-andre.pagesperso-orange.fr/dedup122-beta.zip
>>
>> Regards
>>
>> Jean-Pierre
>>
>> Jelle de Jong wrote:
>>> Hi Jean-Pierre,
>>>
>>> # output: md5sum *.gz
>>> https://powermail.nu/nextcloud/index.php/s/jxler2rZOqBdpr2
>>>
>>> 879499f9187b0f590ae92460f4949dfd  stream.data.full.dir.tar.gz
>>> a8fc902613486e332898f92aba26c61f  reparse-tags.gz
>>>
>>> Kind regards,
>>>
>>> Jelle de Jong
>>>
>>> On 23/02/17 14:24, Jean-Pierre André wrote:
>>>> Hi,
>>>>
>>>> Can you also post the md5 (or sha1, or ...) of the big
>>>> file. The connection is frequently interrupted, and I
>>>> cannot rely on the downloaded file without a check.
>>>>
>>>> Jean-Pierre
>>>>
>>>> Jelle de Jong wrote:
>>>>> Hi Jean-Pierre,
>>>>>
>>>>> Thank you!
>>>>>
>>>>> The reparse-tags.gz file:
>>>>> https://powermail.nu/nextcloud/index.php/s/fS6Y6bpzoMgPiZ0
>>>>>
>>>>> Generated by running: getfattr -e hex -n system.ntfs_reparse_data -R
>>>>> /mnt/sr7-sdb2/ 2> /dev/null | grep ntfs_reparse_data | gzip >
>>>>> /root/reparse-tags.gz
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Jelle de Jong
>>>>>
>>>>> On 23/02/17 12:07, Jean-Pierre André wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Jelle de Jong wrote:
>>>>>>> Dear Jean-Pierre,
>>>>>>>
>>>>>>> I thought version 1.2.1 of the plug-in was working so I took it
>>>>>>> further
>>>>>>> into production, but during backups with rdiff-backup and
>>>>>>> guestmount it
>>>>>>> created a 100% cpu load in qemu process that stayed there for days
>>>>>>> until
>>>>>>> I killed them, I tested this twice. So I went back to a
>>>>>>> xpart/mount -t
>>>>>>> ntfs command and found more "Bad stream for offset" and found that
>>>>>>> the
>>>>>>> /sbin/mount.ntfs-3g command was running at 100% cpu load and hanged
>>>>>>> there.
>>>>>>
>>>>>> Too bad.
>>>>>>
>>>>>>> I have added the whole Stream directory here: (1.1GB)
>>>>>>> https://powermail.nu/nextcloud/index.php/s/vbq85qZ2wcVYxrG
>>>>>>>
>>>>>>> Separate stream file: stream.data.full.000c0000.00020001.gz
>>>>>>> https://powermail.nu/nextcloud/index.php/s/QinV51XE4jrAH7a
>>>>>>>
>>>>>>> All the commands I used:
>>>>>>> http://paste.debian.net/plainh/c0ea5950
>>>>>>>
>>>>>>> I do not know how to get the reparse tags of all the files, maybe
>>>>>>> you
>>>>>>> can help me how to get all the information you need.
>>>>>>
>>>>>> Just use option -R on the base directory :
>>>>>>
>>>>>> getfattr -e hex -n system.ntfs_reparse_data -R base-dir
>>>>>>
>>>>>> Notes :
>>>>>> 1) files with no reparse tags (those which are not deduplicated)
>>>>>> will throw an error
>>>>>> 2) this will output the file names, which you might not want
>>>>>> to disclose. Fortunately I do not need them for now.
>>>>>>
>>>>>> So you may append to the above command :
>>>>>>
>>>>>> 2> /dev/null | grep ntfs_reparse_data | gz > reparse-tags.gz
>>>>>>
>>>>>> With that, I will be able to build a configuration similar
>>>>>> to yours... apart from the files themselves.
>>>>>>
>>>>>> Regards
>>>>>>
>>>>>> Jean-Pierre
>>>>>>
>>>>>>>
>>>>>>> Thank you for your help!
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Jelle de Jong
>>>>>>>
>>>>>>> On 14/02/17 15:55, Jean-Pierre André wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Jelle de Jong wrote:
>>>>>>>>> Hi Jean-Pierre,
>>>>>>>>>
>>>>>>>>> If we have to switch to Windows 2012 and thereby having an
>>>>>>>>> environment
>>>>>>>>> similar to yours then we can switch to an other Windows version.
>>>>>>>>
>>>>>>>> I do not have any Windows Server, and my analysis
>>>>>>>> and tests are based on an unofficial deduplication
>>>>>>>> package which was adapted to Windows 10 Pro.
>>>>>>>>
>>>>>>>> A few months ago, following a bug report, I had to
>>>>>>>> make changes for Windows Server 2012 which uses an
>>>>>>>> older data format, and my only experience about this
>>>>>>>> format is related to this report. So switching to
>>>>>>>> Windows 2012 is not guaranteed to make debugging easier.
>>>>>>>>
>>>>>>>>> We are running out of disk space here so if switching Windows
>>>>>>>>> versions
>>>>>>>>> makes the process of having data deduplication working easer
>>>>>>>>> then me
>>>>>>>>> know.
>>>>>>>>
>>>>>>>> I have not yet analyzed your latest report, but it
>>>>>>>> would probably be useful I build a full copy of
>>>>>>>> non-user data from your partition :
>>>>>>>> - the reparse tags of all your files,
>>>>>>>> - all the "*.ccc" files in the Stream directory
>>>>>>>>
>>>>>>>> Do not do it now, I must first dig into the data you
>>>>>>>> posted.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Jean-Pierre
>>>>>>>>
>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>>
>>>>>>>>> Jelle de Jong
>>>>>>>>>
>>>>>>>>> On 09/02/17 13:46, Jelle de Jong wrote:
>>>>>>>>>> Hi Jean-Pierre,
>>>>>>>>>>
>>>>>>>>>> In case you are wondering:
>>>>>>>>>>
>>>>>>>>>> I am using data deduplication in Windows 2016 for my test
>>>>>>>>>> environment
>>>>>>>>>> iso:
>>>>>>>>>> SW_DVD9_Win_Svr_STD_Core_and_DataCtr_Core_2016_64Bit_English_-2_MLF_X21-22843.ISO
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Kind regards,
>>>>>>>>>>
>>>>>>>>>> Jelle de Jong
>>>>>>>>>>
>>>>>>>>>> On 09/02/17 11:41, Jean-Pierre André wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Jelle de Jong wrote:
>>>>>>>>>>>> Hi Jean-Pierre,
>>>>>>>>>>>>
>>>>>>>>>>>> Thank you!
>>>>>>>>>>>>
>>>>>>>>>>>> The new plug-in seems to work for now, I am moving it into
>>>>>>>>>>>> testing
>>>>>>>>>>>> phase
>>>>>>>>>>>> with-in our production back-up scripts.
>>>>>>>>>>>
>>>>>>>>>>> Please wait a few hours, I have found a bug which
>>>>>>>>>>> I have fixed. I am currently inserting your data
>>>>>>>>>>> into my test base in order to rerun all my tests.
>>>>>>>>>>>
>>>>>>>>>>>> Will you release the source code eventually, would like to
>>>>>>>>>>>> write a
>>>>>>>>>>>> blog
>>>>>>>>>>>> post about how to add the support.
>>>>>>>>>>>
>>>>>>>>>>> What exactly do you mean ? If it is about how to
>>>>>>>>>>> collect the data in a unsupported condition, it is
>>>>>>>>>>> difficult, because unsupported generally means
>>>>>>>>>>> unknown territory...
>>>>>>>>>>>
>>>>>>>>>>>> What do you think the changes are of the plug-in stop working
>>>>>>>>>>>> again?
>>>>>>>>>>>
>>>>>>>>>>> (assuming a typo changes -> chances)
>>>>>>>>>>> Your files were in a condition not met before : data
>>>>>>>>>>> has been relocated according to a logic I do not fully
>>>>>>>>>>> understand. Maybe this is an intermediate step in the
>>>>>>>>>>> process of updating the files, anyway this can happen.
>>>>>>>>>>>
>>>>>>>>>>> The situation I am facing is that I have a single
>>>>>>>>>>> example from which it is difficult to derive the rules.
>>>>>>>>>>> So yes, the plugin may stop working again.
>>>>>>>>>>>
>>>>>>>>>>> Note : there are strict consistency checks in the plugin,
>>>>>>>>>>> so it is unlikely you read invalid data. Moreover if
>>>>>>>>>>> you only mount read-only you cannot damage the deduplicated
>>>>>>>>>>> partition.
>>>>>>>>>>>
>>>>>>>>>>>> We do not have an automatic test running to verify the
>>>>>>>>>>>> back-ups at
>>>>>>>>>>>> this
>>>>>>>>>>>> moment _yet_, so if the plug-in stops working, incremental
>>>>>>>>>>>> file-based
>>>>>>>>>>>> back-ups with empty files will slowly get in the back-ups this
>>>>>>>>>>>> way :|
>>>>>>>>>>>
>>>>>>>>>>> Usually a deduplicated partition is only used for backups,
>>>>>>>>>>> and reading from backups is only for recovering former
>>>>>>>>>>> versions of files (on demand).
>>>>>>>>>>>
>>>>>>>>>>> If you access deduplicated files with no human control,
>>>>>>>>>>> you have to insert your own checks in the process. I
>>>>>>>>>>> would at least check whether the size of the recovered
>>>>>>>>>>> file is the same as the deduplicated one (also grep for
>>>>>>>>>>> messages in the syslog).
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>>
>>>>>>>>>>> Jean-Pierre
>>>>>>>>>>>
>>>>>>>>>>>> Again thank you for all your help so far!
>>>>>>>>>>>>
>>>>>>>>>>>> Kind regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Jelle de Jong
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 08/02/17 15:59, Jean-Pierre André wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Can you please make a try with :
>>>>>>>>>>>>> http://jp-andre.pagesperso-orange.fr/dedup120-beta.zip
>>>>>>>>>>>>>
>>>>>>>>>>>>> This is experimental and based on assumptions which have
>>>>>>>>>>>>> to be clarified, but it should work in your environment.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>
>>>>>>>>>>>>> Jean-Pierre
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Re: [ntfs-3g-devel] ntfs-3g support for volumes with data deduplication windows 2012

Reply via email to