Hello Jeremy,
Hello Christian,
Hi all,

> >Resident wild-guy Christian here. In summary, I'm seeing an odd problem
> >from both Windows Explorer and Mac Finder when copy-pasting a large batch
> >of files to a Lustre-backed SMB share wherein the file manager appears to
> >enumerate over all the files to determine if they can be made in the
> >destination, makes 0-byte stubs of each file, and then proudly states that
> >the files already exist in the destination. This behavior is
> >less-than-desirable and I was hoping someone else here had run into it, or
> >someone better than I at diagnosing SMB weirdness could give me a hand
> >debugging. I'm inclined to think it's a Lustre idiosyncracy, since this
> >does not occur on a ZFS share.

> If it doesn't happen on a ZFS share, but does on a Lustre one then
> it's certainly a Lustre oddity. Try looking at debug logs to figure
> out what Lustre is doing that smbd doesn't like.

A client has now also run into this behaviour and I think I've made some 
headway determining what's going on. To recap, we're seeing what Christian 
described in his original report with some additional details:

- copying files from a local NTFS filesystem on Windows 10 to a Lustre share 
works fine
- duplicating files *on* the Lustre share shows the problematic behaviour: for 
each file Windows complains that the file already exists and indeed, a 
zero-sized file of that name has been created where before there was none
- copying a file off of the Lustre share onto the local NTFS works fine (obvs.) 
but copying it back onto the Lustre share again shows the problem - so it is 
travelling with the files and suggests that it's not related to 
sever-side-copying
- switching option ea support to no works around the problem - so it seems to 
be related to extended attributes

The relevant part of a level 10 debug log reads:

[2022/09/21 15:50:57.127478, 10, pid=1967443, effective(200737, 130), 
real(200737, 0), class=locking] 
../../source3/locking/share_mode_lock.c:171(share_mode_memcache_store)
  share_mode_memcache_store: stored entry for file blorf/blorf/blorf/blim.txt 
epoch d7ce6bcd04632a key 3188158982:144115340681419950:0
[2022/09/21 15:50:57.127601, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:335(get_ea_names_from_fsp)
  get_ea_names_from_fsp: ea_namelist size = 11
[2022/09/21 15:50:57.127612, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:262(get_ea_value_fsp)
  get_ea_value: EA lustre.lov is of length 56
[2022/09/21 15:50:57.127616, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../lib/util/util.c:570(dump_data)
  [0000] D0 0B D1 0B 01 00 00 00   AE 1C 00 00 00 00 00 00   ........ ........
  [0010] 88 23 00 00 02 00 00 00   00 00 10 00 01 00 00 00   .#...... ........
  [0020] 26 CB 2C 00 00 00 00 00   00 00 00 00 00 00 00 00   &.,..... ........
  [0030] 00 00 00 00 08 00 00 00                             ........
[2022/09/21 15:50:57.127641, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:508(get_ea_list_from_fsp)
  get_ea_list_from_file: total_len = 71, lustre.lov, val len = 56
[2022/09/21 15:50:57.127646, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:520(get_ea_list_from_fsp)
  get_ea_list_from_file: total_len = 75
[2022/09/21 15:50:57.127651, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:726(canonicalize_ea_name)
  canonicalize_ea_name: LUSTRE.LOV -> lustre.lov
[2022/09/21 15:50:57.127655, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:786(set_ea)
  set_ea: ea_name user.lustre.lov ealen = 56
[2022/09/21 15:50:57.127660, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/trans2.c:810(set_ea)
  set_ea: setting ea name user.lustre.lov on file blorf/blorf/blorf/blim.txt by 
file descriptor.
[2022/09/21 15:50:57.127668, 10, pid=1967443, effective(200737, 130), 
real(200737, 0)] ../../source3/smbd/open.c:6066(create_file_unixpath)
  create_file_unixpath: NT_STATUS_EAS_NOT_SUPPORTED

So create_file_unixpath errors out with NT_STATUS_EAS_NOT_SUPPORTED because it 
cannot set extended attribute user.lustre.lov in Lustre. It's this codepath:

https://github.com/samba-team/samba/blob/d9dda4b7af284ecbee4d04a89bd16fc0098e2931/source3/smbd/open.c#L6395

My guess (with somewhat fuzzy knowledge on Windows API behaviour) is that 
Windows Explorer does expect a CreateFile call with extended attributes to fail 
atomically and gets confused by the fact that samba leaves the file it just 
created but couldn't set the attributes on lying around after erroring out of 
the open. And indeed, patching the code path like so avoids the problem:

--- samba-4.17.0.orig/source3/smbd/open.c       2022-09-06 16:22:03.302113000 
+0200
+++ samba-4.17.0/source3/smbd/open.c    2022-09-21 12:38:22.724000000 +0200
@@ -6098,6 +6098,13 @@
            ((info == FILE_WAS_CREATED) || (info == FILE_WAS_OVERWRITTEN))) {
                status = set_ea(conn, fsp, ea_list);
                if (!NT_STATUS_IS_OK(status)) {
+                       DEBUG(10, ("set_ea failed: %s\n", nt_errstr(status)));
+                       if (info == FILE_WAS_CREATED) {
+                               DEBUG(10, ("file was created - unlinking\n"));
+                               SMB_VFS_UNLINKAT(conn, dirfsp,
+                                               smb_fname_atname, 0);
+                       }
+
                        goto fail;
                }
        }

That leaves the question, where that extended attribute user.lustre.lov is 
coming from. It appears that Lustre itself exposes a number of extended 
attributes for every file which reflect internal housekeeping data:

$ getfattr -m - blarf
# file: blarf
lustre.lov
trusted.link
trusted.lma
trusted.lov

$ getfattr -n lustre.lov blarf
# file: blarf
lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA=

Unprivileged users only get to see the lustre.lov attribute:

> getfattr -m - blarf
# file: blarf
lustre.lov

> getfattr -n lustre.lov blarf
# file: blarf
lustre.lov=0s0AvRCwEAAABgJAEAAAAAAIQjAAACAAAAAAAQAAEAAAAAqAAAAAAAAAAAAAAAAAAAAAAAAAYAAAA=

That attribute is exposed to Windows by samba and even travels with the file to 
a local NTFS file system which explains my third observation from above. 
Verified with a small program I wrote leveraging BackupRead (attached):

C:\Users\user>get_ea "blorf\blarf"
LUSTRE.LOV=<binary garbage because I couldn't be bothered to base64 encode it>

While I can also see a QueryEAFile call in ProcMon on Windows, its UI 
unfortunately is not able to show me any information about what extended 
attributes are returned from it or passed to the failing CreateFile.

While exposing the lustre.lov attribute to unprivileged users, Lustre (at least 
by default) does not support user xattrs and throws ENOTSUP when trying 
anything of that sort, particularly setting them:

> getfattr -n user.foo blarf
blarf: user.foo: Operation not supported
> setfattr -n user.foo -v dummy blarf
setfattr: blarf: Operation not supported

This would explain why samba gets to see the attribute but runs into ENOTSUP 
when trying to set it, even after trying very hard to make it work by mapping 
it from that (non-existant) lustre.* namespace into the user namespace as 
user.lustre.lov.

Finally, I was also able to avoid the problem by writing a small VFS module 
which hooks getxattr and listxattr and filters out the lustre.lov attribute so 
it's never exposed to the client (attached).

What's your take on all this?
What could a proper solution look like?

Additional info: For debugging purposes I was able to replicate the exact same 
behaviour with fuse and sshfs: sshfs does not normally implement extended 
attributes. Adding getxattr and listxattr functions which synthesize a dummy 
lustre.lov attribute one can observe exactly the same behaviour without 
spinning up a whole Lustre. (attached as well)
-- 
Thanks in advance,
Michael
________________________________________
From: samba <[email protected]> on behalf of Jeremy Allison via 
samba <[email protected]>
Sent: 05 April 2022 04:31:52
To: Christian Kuntz
Cc: sambalist
Subject: Re: [Samba] Odd "File exists" behavior when copy-pasting many files to 
an SMB exported Lustre FS

Caution! External email. Do not open attachments or click links, unless this 
email comes from a known sender and you know the content is safe.

On Mon, Apr 04, 2022 at 03:26:03PM -0700, Christian Kuntz via samba wrote:
>Hi all,
>
>Resident wild-guy Christian here. In summary, I'm seeing an odd problem
>from both Windows Explorer and Mac Finder when copy-pasting a large batch
>of files to a Lustre-backed SMB share wherein the file manager appears to
>enumerate over all the files to determine if they can be made in the
>destination, makes 0-byte stubs of each file, and then proudly states that
>the files already exist in the destination. This behavior is
>less-than-desirable and I was hoping someone else here had run into it, or
>someone better than I at diagnosing SMB weirdness could give me a hand
>debugging. I'm inclined to think it's a Lustre idiosyncracy, since this
>does not occur on a ZFS share.

If it doesn't happen on a ZFS share, but does on a Lustre one then
it's certainly a Lustre oddity. Try looking at debug logs to figure
out what Lustre is doing that smbd doesn't like.

--
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba

diff -ur samba-4.15.5.ori/source3/modules/wscript_build samba-4.15.5/source3/modules/wscript_build
--- samba-4.15.5.ori/source3/modules/wscript_build	2021-08-09 15:38:36.435381200 +0200
+++ samba-4.15.5/source3/modules/wscript_build	2022-09-15 10:05:38.690716949 +0200
@@ -641,3 +641,10 @@
                  init_function='',
                  internal_module=bld.SAMBA3_IS_STATIC_MODULE('vfs_widelinks'),
                  enabled=bld.SAMBA3_IS_ENABLED_MODULE('vfs_widelinks'))
+
+bld.SAMBA3_MODULE('vfs_filter_lustre_private_xattr',
+                 subsystem='vfs',
+                 source='vfs_filter_lustre_private_xattr.c',
+                 init_function='',
+                 internal_module=bld.SAMBA3_IS_STATIC_MODULE('vfs_filter_lustre_private_xattr'),
+                 enabled=bld.SAMBA3_IS_ENABLED_MODULE('vfs_filter_lustre_private_xattr'))
diff -ur samba-4.15.5.ori/source3/wscript samba-4.15.5/source3/wscript
--- samba-4.15.5.ori/source3/wscript	2021-10-27 14:53:55.900512200 +0200
+++ samba-4.15.5/source3/wscript	2022-09-15 10:12:05.717420975 +0200
@@ -1886,7 +1886,8 @@
                                       'vfs_preopen', 'vfs_catia',
                                       'vfs_media_harmony', 'vfs_unityed_media', 'vfs_fruit', 'vfs_shell_snap',
                                       'vfs_commit', 'vfs_worm', 'vfs_crossrename', 'vfs_linux_xfs_sgid',
-                                      'vfs_time_audit', 'vfs_offline', 'vfs_virusfilter', 'vfs_widelinks'])
+                                      'vfs_time_audit', 'vfs_offline', 'vfs_virusfilter', 'vfs_widelinks',
+                                      'vfs_filter_lustre_private_xattr'])
     if host_os.rfind('linux') > -1:
         default_shared_modules.extend(['vfs_snapper'])
 
--- /dev/null	2022-08-08 11:10:15.373334812 +0200
+++ samba-4.15.5/source3/modules/vfs_filter_lustre_private_xattr.c	2022-09-15 12:52:27.908346500 +0200
@@ -0,0 +1,120 @@
+/*
+ * VFS module filtering lustre-internal xattrs such as lustre.lov.
+ *
+ * Copyright (C) Tim Potter, 1999-2000
+ * Copyright (C) Alexander Bokovoy, 2002
+ * Copyright (C) Stefan (metze) Metzmacher, 2003
+ * Copyright (C) Jeremy Allison 2009
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 3 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "includes.h"
+
+#undef DBGC_CLASS
+#define DBGC_CLASS DBGC_VFS
+
+#define MODULE_NAME "filter_lustre_private_xattr_fgetxattr"
+
+static ssize_t vfs_filter_lustre_private_xattr_fgetxattr(
+		vfs_handle_struct *handle, struct files_struct *fsp,
+		const char *name, void *value, size_t size)
+{
+	if (strcmp(name, "lustre.lov") == 0) {
+		DEBUG(5, ("vfs_filter_lustre_private_xattr_fgetxattr: "
+			"ignoring ea lustre.lov\n"));
+		errno = ENOTSUP;
+		return -1;
+	}
+
+	return SMB_VFS_NEXT_FGETXATTR(handle, fsp, name, value, size);
+}
+
+static ssize_t vfs_filter_lustre_private_xattr_flistxattr(
+		vfs_handle_struct *handle, struct files_struct *fsp,
+		char *list, size_t size)
+{
+	ssize_t rsize = SMB_VFS_NEXT_FLISTXATTR(handle, fsp, list, size);
+	char *p = list, *op = list;
+
+	if (rsize < 0) {
+		return rsize;
+	}
+
+	while (p < list + rsize) {
+		/* walk to next nul terminator */
+		while (*p != '\0' && p < list + rsize) {
+			p++;
+		}
+
+		/* not nul-terminated? buffer overflow? - we didn't do it */
+		if (p >= list + rsize) {
+			break;
+		}
+
+		/* move past nul */
+		p++;
+
+		DEBUG(10, ("vfs_filter_lustre_private_xattr_flistxattr: "
+			"looking at ea %s\n", op));
+
+		/* can there be multiple occurences? */
+		while (strcmp(op, "lustre.lov") == 0) {
+			DEBUG(5, ("vfs_filter_lustre_private_xattr_flistxattr: "
+				"found ea %s to ignore\n", op));
+
+			/* 012 345 678
+			 * aa\0bb\0cc\0
+			 * ^op ^p */
+			ssize_t ignorelen = p - op;
+			if (p < list + rsize) {
+				/* move the rest forward and nul out the
+				 * overhang */
+				ssize_t rest = rsize - (p - list);
+				memmove(op, p, rest);
+				memset(op + rest, '\0', ignorelen);
+
+				DEBUG(10, ("vfs_filter_lustre_private_xattr_flistxattr: "
+					"moved buffer forward %p:%ld:%p:%p:%ld:%ld\n",
+					list, rsize, op, p, ignorelen, rest));
+			} else {
+				/* just null the end */
+				memset(op, '\0', ignorelen);
+
+				DEBUG(10, ("vfs_filter_lustre_private_xattr_flistxattr: "
+					"nulled buffer end %p:%ld:%p:%p:%ld\n",
+					list, rsize, op, p, ignorelen));
+			}
+
+			rsize -= ignorelen;
+		}
+
+		op = p;
+	}
+
+	return rsize;
+}
+
+static struct vfs_fn_pointers vfs_filter_lustre_private_xattr_fns = {
+	.fgetxattr_fn = vfs_filter_lustre_private_xattr_fgetxattr,
+	.flistxattr_fn = vfs_filter_lustre_private_xattr_flistxattr,
+};
+
+static_decl_vfs;
+NTSTATUS vfs_filter_lustre_private_xattr_init(TALLOC_CTX *ctx)
+{
+	return smb_register_vfs(SMB_VFS_INTERFACE_VERSION,
+			"filter_lustre_private_xattr",
+			&vfs_filter_lustre_private_xattr_fns);
+}
diff --git a/sshfs.c b/sshfs.c
index 5513266..c4ae263 100644
--- a/sshfs.c
+++ b/sshfs.c
@@ -3409,6 +3409,40 @@ static int sshfs_getattr(const char *path, struct stat *stbuf,
 	return err;
 }
 
+static int sshfs_listxattr(const char *path, char *value, size_t size) {
+	(void)path;
+
+	if (size == 0) {
+		return 11;
+	}
+
+	if (size < 11) {
+		return -ERANGE;
+	}
+
+	strcpy(value, "lustre.lov");
+	return 11;
+}
+
+static int sshfs_getxattr(const char *path, const char *name, char *value, size_t size) {
+	(void)path;
+
+	if (strcmp(name, "lustre.lov") != 0) {
+		return -ENODATA;
+	}
+
+	if (size == 0) {
+		return 6;
+	}
+
+	if (size < 6) {
+		return -ERANGE;
+	}
+
+	strcpy(value, "dummy");
+	return 6;
+}
+
 static int sshfs_truncate_zero(const char *path)
 {
 	int err;
@@ -3557,6 +3591,8 @@ static int processing_init(void)
 static struct fuse_operations sshfs_oper = {
 		.init       = sshfs_init,
 		.getattr    = sshfs_getattr,
+		.listxattr  = sshfs_listxattr,
+		.getxattr   = sshfs_getxattr,
 		.access     = sshfs_access,
 		.opendir    = sshfs_opendir,
 		.readdir    = sshfs_readdir,
// https://github.com/microsoft/Windows-classic-samples/blob/main/Samples/DataDedupBackupRestore/cpp/DedupBackupRestore.cpp
// https://learn.microsoft.com/en-us/windows/win32/api/winbase/ns-winbase-win32_stream_id
// https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/ns-wdm-_file_full_ea_information?redirectedfrom=MSDN

#include <windows.h>
#include <winternl.h>
#include <stdlib.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
	if (argc < 2) {
		fprintf(stderr, "get_ea <file name>");
		return 1;
	}

	HANDLE h = CreateFile(argv[1],
			GENERIC_EXECUTE|GENERIC_READ|GENERIC_WRITE,
			FILE_SHARE_READ|FILE_SHARE_WRITE, NULL, OPEN_EXISTING,
			FILE_ATTRIBUTE_ARCHIVE, NULL);
	if (!h) {
		fprintf(stderr, "Error openingfile '%s'\n", argv[1]);
		return 1;
	}

	BYTE buffer[32768];
	DWORD read = 0;
	void *ctx = NULL;
	if (!BackupRead(h, buffer, sizeof(buffer), &read, FALSE, TRUE, &ctx)) {
		fprintf(stderr, "Error reading backup data\n");
		return 1;
	}

	BYTE *p = buffer;
	while (p < buffer + read) {
		WIN32_STREAM_ID *wsi = (WIN32_STREAM_ID*)p;

		// move past the part of the structure that is not counted by
		// the Size member, cStreamName and the data is
		p += FIELD_OFFSET(WIN32_STREAM_ID, cStreamName);

		if (wsi->dwStreamId != BACKUP_EA_DATA) {
			p += wsi->Size.QuadPart;
			continue;
		}

		// "streams" which are not alternate data streams do not have a
		// name and dwStreamNameSize will be zero but let's be strict
		// about it
		p += wsi->dwStreamNameSize;

		FILE_FULL_EA_INFORMATION *ffeai = (FILE_FULL_EA_INFORMATION *)p;
		while ((BYTE *)ffeai < p + wsi->Size.QuadPart) {
			fwrite(ffeai->EaName, ffeai->EaNameLength, 1, stdout);
			fwrite("=", 1, 1, stdout);

			BYTE *p2 = (BYTE *)ffeai +
				FIELD_OFFSET(FILE_FULL_EA_INFORMATION, EaName)
				+ ffeai->EaNameLength + 1;

			fwrite(p2, ffeai->EaValueLength, 1, stdout);
			fwrite("\n", 1, 1, stdout);

			ffeai = (FILE_FULL_EA_INFORMATION *)(p2 + ffeai->EaValueLength);
		}

		break;
	}

	return 0;
}
_______________________________________________
lustre-discuss mailing list
[email protected]
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Reply via email to