I have prepared a mini-patch for explaining better what I mean with the third point (I have used EXTRACTOR_METATYPE_NONE in the end, I think it is more clear).
Please find it attached --madmurphy On Tue, Feb 8, 2022 at 1:38 PM madmurphy <[email protected]> wrote: > Got it! I agree about your solution for the duplicate mime types. > > but until that is done, a key-value pair type would at least be better > than 'unknown'. > > “Unknown” can continue to exist as an identifier for other cases, just not > the key-value ones :) > > Also I forgot to mention a third point: > > 3. Add an EXTRACTOR_METATYPE_NO_METATYPE = -1 to enum EXTRACTOR_MetaType > (more or less like NULL if that was a pointer). Without a > EXTRACTOR_METATYPE_NO_METATYPE a programmer is forced to save the > have_metatype information in another variable. The fact that it is a > negative number is not a problem, because as the name suggests, *it is > not a metatype*. > > P.S. Sorry for picking the wrong mailing list! > > On Tue, Feb 8, 2022 at 9:57 AM Christian Grothoff <[email protected]> > wrote: > >> Hi madmurphy, >> >> The 'correct' place for GNU libextractor discussions would be >> >> https://lists.gnu.org/mailman/listinfo/libextractor >> >> Alas, with my libextractor maintainer hat on, I would say this: >> >> On 2/7/22 10:01 PM, madmurphy wrote: >> > Hi again, GNUnet people. >> > >> > Is this the place where to discuss about libextractor? I have two >> points. >> > >> > #1 I often see something interesting. Key-value pairs are categorized as >> > |EXTRACTOR_METATYPE_UNKNOWN|: >> > >> > unknown: chroma-format=4:2:0 >> > unknown: bit-depth-chroma=8 >> > unknown: colorimetry=bt709 >> > unknown: stream-format=avc >> > unknown: stream-format=raw >> > unknown: bit-depth-luma=8 >> > unknown: base-profile=lc >> > unknown: mpegversion=4 >> > unknown: profile=high >> > unknown: alignment=au >> > unknown: parsed=true >> > unknown: framed=true >> > unknown: variant=iso >> > unknown: profile=lc >> > unknown: level=4.1 >> > >> > But one point is that they are often numerous, and another point is that >> > that of a key-value type is a really interesting metatype to have (and >> > is not really “unknown”, since the key is self-explanatory). Would it >> > not make sense to add an |EXTRACTOR_METATYPE_KEY_VALUE_PAIR| to the list >> > of MetaTypes? >> >> We could do that. Sometimes I think it would be better to add new >> specific LE types for some of the above, but until that is done, a >> key-value pair type would at least be better than 'unknown'. >> >> > ... >> > >> > /* generic attributes */ >> > EXTRACTOR_METATYPE_UNKNOWN = 45, >> > EXTRACTOR_METATYPE_DESCRIPTION = 46, >> > EXTRACTOR_METATYPE_COPYRIGHT = 47, >> > EXTRACTOR_METATYPE_RIGHTS = 48, >> > EXTRACTOR_METATYPE_KEYWORDS = 49, >> > EXTRACTOR_METATYPE_ABSTRACT = 50, >> > EXTRACTOR_METATYPE_SUMMARY = 51, >> > EXTRACTOR_METATYPE_SUBJECT = 52, >> > EXTRACTOR_METATYPE_CREATOR = 53, >> > EXTRACTOR_METATYPE_FORMAT = 54, >> > EXTRACTOR_METATYPE_FORMAT_VERSION = 55, >> > *EXTRACTOR_METATYPE_KEY_VALUE_PAIR* = XXX, >> > >> > ... >> > >> > #2 I often see that files get tagged with multiple mime types according >> > to libextractor: >> > >> > mimetype: video/quicktime >> > mimetype: video/x-h264 >> > mimetype: audio/mpeg >> > mimetype: video/mp4 >> >> That is because different plugins (using different methods/libraries) >> disagree on the 'correct' mime-type. Ideally, we'd identify which plugin >> gets it wrong (and why), and unify the mime-types. >> >> > But that never reflects the reality, since files should have only one >> > mime type (or at most, multiple mime types that mean the same thing). >> > But then I see what happens with file names: there is only one >> > |EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME|, but there can be many >> > |EXTRACTOR_METATYPE_FILENAME|s (in the case of archives, for example): >> > >> > EXTRACTOR_METATYPE_FILENAME = 2, >> > ... >> > EXTRACTOR_METATYPE_GNUNET_ORIGINAL_FILENAME = 180, >> > >> > Would it not make sense to do something similar for mime types? Only one >> > “original mime type”, and an infinity of secondary mime types…? >> > >> > EXTRACTOR_METATYPE_MIMETYPE = 1, >> > ... >> > *EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE* = XXX, >> >> I guess it depends. If this is for archives where files _inside_ the >> archive are given mime-types, then a different metatype makes sense >> (ditto for FILENAME: here we probably could have two types, one for the >> 'archive' and one for the 'contents'). But if the different mime-types >> are all about the 'original' file, then we should rather figure out >> which plugin gets it wrong. As for the "_GNUNET_" in the >> "_GNUNET_ORIGINAL_FILENAME" there, IIRC this again different because >> that is NOT a metatype used by GNU libextractor, but one that GNUnet >> itself generates and puts with the 'rest ' of the metadata. >> >> > So, two simple proposals: >> > >> > 1. Create |EXTRACTOR_METATYPE_KEY_VALUE_PAIR| >> > 2. Create |EXTRACTOR_METATYPE_GNUNET_ORIGINAL_MIMETYPE| >> > >> > What do you think? Does it make sense? >> >> It should definitively not be "GNUNET_ORIGINAL_MIMETYPE", and the real >> question is what is the origin of the different mime-types. If this is >> from an archive, maybe we should introduce >> >> EXTRACTOR_MIMETYPE_ARCHIVE_CONTENT_FILENAME >> EXTRACTOR_MIMETYPE_ARCHIVE_CONTENT_MIMETYPE >> >> and reserve >> >> EXTRACTOR_MIMETYPE_FILENAME >> EXTRACTOR_MIMETYPE_MIMETYPE >> >> for the top-level file. But AFAIK that won't solve your mime-type issue, >> which should really be resolved by going over the plugins and finding >> out why and where they disagree and picking the 'right' answer. >> >> My 2 cents >> >> Christian >> >>
<<attachment: add-extractor_metatype_none.patch.zip>>
