On Sat, Jan 03, 2026 at 10:05:16PM +0100, "Peter B." <[email protected]> wrote:
> Hi! > > I'm super happy to see that I'm not the only one anymore interested in > increased xattr, and therefore possible key/value usage and query > functionality! > > 😀️ > > **I'm writing this with personal interest** > Yes, we ALL have data to deal with, and I like to tag and find my kids' > photos and files - regardless which "app" or system- > > **as well as my professional interest** > Working with small-to-medium-to-very-very-large digital heritage > collections. > > (And therefore loads of meta+data wrangling clever-hacks-and-stunts for a > living 😉️) > > > So, my hopefully useful "few-cents" to this thread are here: > > On 1/2/26 13:11, raf wrote: > > On Wed, Dec 31, 2025 at 04:21:14PM +0100, Bernhard Voelker > > <[email protected]> wrote: > > > > > On 12/30/25 17:45, Peter B. wrote: > > > > I'll also checkout Morgan's find-patch: > > > > I'd really love to see xattr-support in the basic "most likely to be > > > > present-on-any-box" tools. > > > The technical coding under the hood for reading xattr from the file system > > > is there in Morgan's patch for quite a while, indeed. > > > > > > But before adding, I feel we need to discuss once again the interface > > > to the find(1) user: > > > > > > - Do we only want to provide an option to search for files having xattrs? > > > find -xattr > > > > > > - Do we want a test option to search for a file having an xattr key > > > matching > > > a certain string (or eventually pattern)? > > > find -xattr 'mykey' # xattr key equals string > > > find -xattr '*mykey*' # xattr key matches pattern > > > Or better explicitly mention in the option that we match for the xattr > > > keys? > > > find -xattr-key 'mykey' > > > find -xattr-key '*mykey*' > > > > > > - Do we want a test option to search for a file having an xattr value > > > matching > > > a certain string (or eventually pattern)? > > > find -xattr-value 'myval' # files with xattr value equals string > > > find -xattr-value '*myval*' # ... or pattern > > > > > > - Or search for files with xattr having a certain mixture of 'key=val'? > > > find -xattr-match 'mykey=myval' # search by key+val as strings > > > find -xattr-match 'mykey=*someval*' # search for key matching a val > > > pattern > > There are versions of find that (I think) use -xattr to > > just identify the existence of EAs. I don't think > > that's enough, but it might make sense to have -xattr > > do that > > +1 > absolutely agree. > To both: useful and not-enough. > > > > (for compatibility with those other versions of > > find), but to also have -xattr-key and -xattr-value (or > > -xattr-match for both). > > +1 again. > It indeed makes perfect sense (and has use-cases!) where one wants to query > (key OR value) AND (key AND value). > > A lot useful if that provides some RegEx (for matching wildcards/patterns). > > > > The thing to watch out for if both the key and value > > are combined, is that if you format it like "key=value" > > you need to consider the case where the key includes > > "=". There might be malicious EAs trying to be tricky. > > True. > There will always be potential for malicious attempts, yet I would suggest > to apply basic "query-escaping", until more strict antitrust seems > necessary? > > > > My rawhide program matches EAs formatted like "key: > > value" but non-ascii bytes are encoded (like \x1b or \n > > or \t etc.) and any ": " in the key itself is encoded > > as "\x3a " to disappoint the creators of malicious EAs. > > I've never seen an EA whose key contained ": " so I > > don't think it'll bother anyone. > > I think we're on the same page here :) > Yet, one may say "xattrs are not-yet-popular enough, but you'll see, once > they do..." - like with all formats and data-exchange protocols, I guess? EAs/xattrs are used for various things by various systems. (e.g. selinux on Linux, quarantine on macOS) but user/app-specific usage probably isn't popular yet. Although it can be. Some systems allow 64KiB of EA data. > I believe if good software make people happy = less people interested in > breaking things ;) > > > > I don't think -xattr-key and -attr-value are any good > > because the value might not match the key. I think the > > key and the value need to be combined/encoded together > > in some way so that they are matched together. > > I disagree (If I understand you correctly). No hard feelings :D > > What if: > - I search for "some term", but don't care if it's a key or value string? Having -xattr-key and -attr-value wouldn't help you in that case. If you didn't care if it's a key or a value, then you'd need to use both predicates in your search (-xattr-key ... -o -xattr-value ...). But the problem I was thinking of was that, if these two predicates existed, then a user might reasonably expect them to be connected to each such that using both of them would refer to the same extended attribute, i.e. that -xattr-key ... -a -xaatr-value ... would match a file with an EA whose key matched the -xattr-key and whose value matched the -xattr-value. But it would almost certainly match a file with an EA with the matching key and an EA with the matching value, but they wouldn't have to be the same EA. They could be different EAs. So I think it would be difficult for a user to guess the correct behaviour. Of course, it might be possible to make the two predicates relate to each other, but then that precludes the behaviour where the user doesn't want them to be related. > I have that use-case a lot: plain fulltext search in collections. That use case is best served (I think) by a single predicate that can match both the key and value encoded in some way as I described earlier (either by "key=val\n" or "key: val\n"). I chose ": " because that's the format output by xattr on macOS (and probably elsewhere). That way, if you don't care whether your search criteria applies to the key or the value, you can just do -xattr-match 'something' and if you want to match a key only you can do -regexxattr-match '^somekey: ' and if you want to match a value only you can do -regexxattr-match '^[^\n]+: [^\n]*someval', and to match both the key and the value, you can do something like -regexxattr-match '^somekey: [^\n]*someval'. The above assumes that all EAs are encoded as a single piece of text with one line per EA that is matched against. The choice of encoding and whether it's encoded like this at all would affect the examples. This sort of matching probably isn't possible with globbing unless the ksh extensions are available. > > I'd vote for just -xattr and -regexxattr (and maybe > > -ixattr and -iregexxattr) and have it match text that > > looks like "key1=val1\nkey2=val2\nkey3=val3\n" or > > "key1: val1\nkey2: val2\nkey3: val3\n". > > I like args names that explain what they are, but just for "style" > suggestion: > > [quick bike-shedding]: > What if, you'd call it "--xattr" and "--xxattr" or "--xattRX/xattrx"? (like > the "RX" at the end is "RegEx" :P) > > Just an idea... I suggested names that I thought were like the existing predicate names. I don't know what would be best. > And I assume "ixattr" is "case Insensitive"? That's right. > > > - Do we need support for -printf formats to print xattr keys and/or > > > values? > > > How? The % 1-character directives are almost all used, maybe begin with > > > the reserved ones? (%{ %[ %( > > > How to output several xattr keys or values? > > > How to select which xattr value to print? > > > find -xattr -printf '%p %{xattr-keys=hello*} %{xattr-valuekey=hello} > > There are plenty of conversion letters available! > > > > See https://savannah.gnu.org/bugs/?64100 > > Having quickly read over that thread, I totally agree "IF those features be > added, IF possible, it'd be great if rawhide and find could stay > syntax-compatible". > > Also, I think (If I understood correctly) that some common-library-printf > formatting syntax may save A LOT of re-occurring bash-or-python-style > "formatting-caller-works-on-my-machine" scripts, I guess? > But I'm far out on a limb here; I've not really read the thread, and I've > never used find-output formatting (as I didn't know it had any). Only GNU find has it. > > > There's a lot of possibilities, and I don't want to introduce something > > > which contradicts the typical use cases, or is not extensible or not > > > maintainable. > > > The complexity - especially when it comes to -printf - is that files can > > > have > > > several xattr while all other attributes (name, timestamps, permissions, > > > size, > > > etc.) only exists once. > > My rawhide program also has %j for JSON output. The > > current version outputs EAs like %x but the next > > version outputs them as a JSON object with key/value > > pairs matching the EA names and values. It's still > > encoded because EA values can be text or binary and > > JSON can't represent binary without the user choosing > > some encoding. So that's something to think about. > > If any output data is available in a machine readable format (like JSON), > it'd definitely make the re-usability and interoperability between tools > (and scripts) written around find/rawhide, IMO (and experience). > > > > I think I asked this kind of questions already some years ago, but there > > > was > > > no input yet. > > > I personally don't have a use case to search for xattrs, so the usual > > > pattern > > > with '-exec getfattr ...' works for me. > > > Anyone? > > ME! Meee! :) > As I said, I'm doing devops for digital-and-physical GLAM-collections - and > a lot of metadata layout, storage and retrieval/usage. > > For example: > After de-embedding (=copying as-is) all exiftool-readable metadata into > xattrs, being able to do some queries on those key/value pairs is AMAZING! > > Combining that with RDF URIs for keys, combined with Wikibase-and-Wikidata > engine and data, I have many great use cases for xattrs (and find). That sounds cool. A much more interesting use for EAs than just selinux or quarantining. :-) > > > P.S. Finally, I wouldn't want to introduce too much and complex code for > > > 0.25% use cases. Eventually, the simple test to search for files having > > > xattrs is enough, and the application logic then extracts them with > > > another > > > tool. > > Is this about the code to handle xattrs (in `gnu-find`)? > There are libs for plain access to xattr key/value, but is there any > xattr-libs for higher-level functions (like queries, regex, etc)? I think the actual searching would be done by what find is already doing (glob, regex). > Like what HaikuOS (probably?) does in their BeFS filesystem libraries? > A friend of mine (author of "https://sen-labs.org/": a semantic desktop > engine for Haiku) told me that BeFS has some database-like query functions > built-into their filesystem "xattrs". > > I know any additional library-dependence is another "whole thing" to > maintain. > However, IMO being able to query key/value data "on the spot" is something > that will eventually replace the legacy "files-in-folders open/save dialogs > - and wildcards" as-we-know-it...? > > Might be an incentive to invest in a common, and commonly supported > "attribute-handling" lib. > Just 2¢ of mine. > > I just like interoperable and open tech-systems :) > > > > I don't think it's necessarily too complex. The code in > > rawhide to obtain EAs is 492 lines (335 excluding > > blanks line, 278 excluding comment lines as well) for > > Linux, macOS, Cygwin, Solaris, and FreeBSD (OpenBSD and > > NetBSD don't have EAs). But of course, that's only the > > start. > > Including so many OSs already in your consideration(s), wouldn't that > already suffice for a common lib? > Ignore and correct me, if I'm totally on the wrong track here? > Thanks :) rawhide is GPLv3+ so the code is available for a library but if potential clients don't like my choices for encoding the EA data it might not be suitable for their needs. Encoding is always tricky to get right, and maybe I didn't. And it's probably unwise to use a third-party library until it has become popular enough to be packaged on lots of systems. And GNU find supports many more systems than rawhide does, and I don't know which other systems support EAs. I only support systems that I have real or virtual machines for. > > And Happy New Solar Orbit! > > Does the sun actually do "new years eve"? O.O > I didn't know that. > > Happy solar orbit bump #2026! > > P. cheers, raf
