Hi Jean-Pierre,

# output: md5sum *.gz
https://powermail.nu/nextcloud/index.php/s/jxler2rZOqBdpr2

879499f9187b0f590ae92460f4949dfd  stream.data.full.dir.tar.gz
a8fc902613486e332898f92aba26c61f  reparse-tags.gz

Kind regards,

Jelle de Jong

On 23/02/17 14:24, Jean-Pierre André wrote:
> Hi,
>
> Can you also post the md5 (or sha1, or ...) of the big
> file. The connection is frequently interrupted, and I
> cannot rely on the downloaded file without a check.
>
> Jean-Pierre
>
> Jelle de Jong wrote:
>> Hi Jean-Pierre,
>>
>> Thank you!
>>
>> The reparse-tags.gz file:
>> https://powermail.nu/nextcloud/index.php/s/fS6Y6bpzoMgPiZ0
>>
>> Generated by running: getfattr -e hex -n system.ntfs_reparse_data -R
>> /mnt/sr7-sdb2/ 2> /dev/null | grep ntfs_reparse_data | gzip >
>> /root/reparse-tags.gz
>>
>> Kind regards,
>>
>> Jelle de Jong
>>
>> On 23/02/17 12:07, Jean-Pierre André wrote:
>>> Hi,
>>>
>>> Jelle de Jong wrote:
>>>> Dear Jean-Pierre,
>>>>
>>>> I thought version 1.2.1 of the plug-in was working so I took it further
>>>> into production, but during backups with rdiff-backup and guestmount it
>>>> created a 100% cpu load in qemu process that stayed there for days
>>>> until
>>>> I killed them, I tested this twice. So I went back to a xpart/mount -t
>>>> ntfs command and found more "Bad stream for offset" and found that the
>>>> /sbin/mount.ntfs-3g command was running at 100% cpu load and hanged
>>>> there.
>>>
>>> Too bad.
>>>
>>>> I have added the whole Stream directory here: (1.1GB)
>>>> https://powermail.nu/nextcloud/index.php/s/vbq85qZ2wcVYxrG
>>>>
>>>> Separate stream file: stream.data.full.000c0000.00020001.gz
>>>> https://powermail.nu/nextcloud/index.php/s/QinV51XE4jrAH7a
>>>>
>>>> All the commands I used:
>>>> http://paste.debian.net/plainh/c0ea5950
>>>>
>>>> I do not know how to get the reparse tags of all the files, maybe you
>>>> can help me how to get all the information you need.
>>>
>>> Just use option -R on the base directory :
>>>
>>> getfattr -e hex -n system.ntfs_reparse_data -R base-dir
>>>
>>> Notes :
>>> 1) files with no reparse tags (those which are not deduplicated)
>>> will throw an error
>>> 2) this will output the file names, which you might not want
>>> to disclose. Fortunately I do not need them for now.
>>>
>>> So you may append to the above command :
>>>
>>> 2> /dev/null | grep ntfs_reparse_data | gz > reparse-tags.gz
>>>
>>> With that, I will be able to build a configuration similar
>>> to yours... apart from the files themselves.
>>>
>>> Regards
>>>
>>> Jean-Pierre
>>>
>>>>
>>>> Thank you for your help!
>>>>
>>>> Kind regards,
>>>>
>>>> Jelle de Jong
>>>>
>>>> On 14/02/17 15:55, Jean-Pierre André wrote:
>>>>> Hi,
>>>>>
>>>>> Jelle de Jong wrote:
>>>>>> Hi Jean-Pierre,
>>>>>>
>>>>>> If we have to switch to Windows 2012 and thereby having an
>>>>>> environment
>>>>>> similar to yours then we can switch to an other Windows version.
>>>>>
>>>>> I do not have any Windows Server, and my analysis
>>>>> and tests are based on an unofficial deduplication
>>>>> package which was adapted to Windows 10 Pro.
>>>>>
>>>>> A few months ago, following a bug report, I had to
>>>>> make changes for Windows Server 2012 which uses an
>>>>> older data format, and my only experience about this
>>>>> format is related to this report. So switching to
>>>>> Windows 2012 is not guaranteed to make debugging easier.
>>>>>
>>>>>> We are running out of disk space here so if switching Windows
>>>>>> versions
>>>>>> makes the process of having data deduplication working easer then me
>>>>>> know.
>>>>>
>>>>> I have not yet analyzed your latest report, but it
>>>>> would probably be useful I build a full copy of
>>>>> non-user data from your partition :
>>>>> - the reparse tags of all your files,
>>>>> - all the "*.ccc" files in the Stream directory
>>>>>
>>>>> Do not do it now, I must first dig into the data you
>>>>> posted.
>>>>>
>>>>> Regards
>>>>>
>>>>> Jean-Pierre
>>>>>
>>>>>
>>>>>> Kind regards,
>>>>>>
>>>>>> Jelle de Jong
>>>>>>
>>>>>> On 09/02/17 13:46, Jelle de Jong wrote:
>>>>>>> Hi Jean-Pierre,
>>>>>>>
>>>>>>> In case you are wondering:
>>>>>>>
>>>>>>> I am using data deduplication in Windows 2016 for my test
>>>>>>> environment
>>>>>>> iso:
>>>>>>> SW_DVD9_Win_Svr_STD_Core_and_DataCtr_Core_2016_64Bit_English_-2_MLF_X21-22843.ISO
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Kind regards,
>>>>>>>
>>>>>>> Jelle de Jong
>>>>>>>
>>>>>>> On 09/02/17 11:41, Jean-Pierre André wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Jelle de Jong wrote:
>>>>>>>>> Hi Jean-Pierre,
>>>>>>>>>
>>>>>>>>> Thank you!
>>>>>>>>>
>>>>>>>>> The new plug-in seems to work for now, I am moving it into testing
>>>>>>>>> phase
>>>>>>>>> with-in our production back-up scripts.
>>>>>>>>
>>>>>>>> Please wait a few hours, I have found a bug which
>>>>>>>> I have fixed. I am currently inserting your data
>>>>>>>> into my test base in order to rerun all my tests.
>>>>>>>>
>>>>>>>>> Will you release the source code eventually, would like to write a
>>>>>>>>> blog
>>>>>>>>> post about how to add the support.
>>>>>>>>
>>>>>>>> What exactly do you mean ? If it is about how to
>>>>>>>> collect the data in a unsupported condition, it is
>>>>>>>> difficult, because unsupported generally means
>>>>>>>> unknown territory...
>>>>>>>>
>>>>>>>>> What do you think the changes are of the plug-in stop working
>>>>>>>>> again?
>>>>>>>>
>>>>>>>> (assuming a typo changes -> chances)
>>>>>>>> Your files were in a condition not met before : data
>>>>>>>> has been relocated according to a logic I do not fully
>>>>>>>> understand. Maybe this is an intermediate step in the
>>>>>>>> process of updating the files, anyway this can happen.
>>>>>>>>
>>>>>>>> The situation I am facing is that I have a single
>>>>>>>> example from which it is difficult to derive the rules.
>>>>>>>> So yes, the plugin may stop working again.
>>>>>>>>
>>>>>>>> Note : there are strict consistency checks in the plugin,
>>>>>>>> so it is unlikely you read invalid data. Moreover if
>>>>>>>> you only mount read-only you cannot damage the deduplicated
>>>>>>>> partition.
>>>>>>>>
>>>>>>>>> We do not have an automatic test running to verify the back-ups at
>>>>>>>>> this
>>>>>>>>> moment _yet_, so if the plug-in stops working, incremental
>>>>>>>>> file-based
>>>>>>>>> back-ups with empty files will slowly get in the back-ups this
>>>>>>>>> way :|
>>>>>>>>
>>>>>>>> Usually a deduplicated partition is only used for backups,
>>>>>>>> and reading from backups is only for recovering former
>>>>>>>> versions of files (on demand).
>>>>>>>>
>>>>>>>> If you access deduplicated files with no human control,
>>>>>>>> you have to insert your own checks in the process. I
>>>>>>>> would at least check whether the size of the recovered
>>>>>>>> file is the same as the deduplicated one (also grep for
>>>>>>>> messages in the syslog).
>>>>>>>>
>>>>>>>> Regards
>>>>>>>>
>>>>>>>> Jean-Pierre
>>>>>>>>
>>>>>>>>> Again thank you for all your help so far!
>>>>>>>>>
>>>>>>>>> Kind regards,
>>>>>>>>>
>>>>>>>>> Jelle de Jong
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 08/02/17 15:59, Jean-Pierre André wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Can you please make a try with :
>>>>>>>>>> http://jp-andre.pagesperso-orange.fr/dedup120-beta.zip
>>>>>>>>>>
>>>>>>>>>> This is experimental and based on assumptions which have
>>>>>>>>>> to be clarified, but it should work in your environment.
>>>>>>>>>>
>>>>>>>>>> Regards
>>>>>>>>>>
>>>>>>>>>> Jean-Pierre
>>>
>>>
>>
>
>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to