Hello Fedora developers,

I'd like to show you a proposal for a new XML format of modular metadata which
reside in YUM repositories.

In short I propose replacing YAML syntax with XML syntax while removing
features which where never implemented or used, while providing a detailed
specification leaving small place for implementer's invention. The proposed
specification is the "reduced" variant under
<https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs>, for
instance
<https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/overview.xml>.

Bear in mind that this change is only about how the modules are stored in YUM
repositories which are fetched by DNF. It does not change how modules are 
defined
by module maintainers (YAML modulemd-packager-v3 or modulemd-v2 format) and
how it is built by MBS and handled by Bodhi.

Those who should be concerned most are DNF5 developers and relengs producing
composes.


Long story:

Original modulemd format had a noble property, and that was an input format
for MBS is the same as the output format. This is not true anymore because of
modulemd-packager-v3 format. It also makes validation difficult as fields
optional in an input format are mandatory in the output format, or vice versa.

Original modulemd format drags in YAML format into YUM repository which is
otherwise XML-only. That requires a YAML parser.

Original modulemd format is not handled by DNF directly. Instead, DNF uses
libmodulemd library. That library is heavily based on glib. In fact it embeds
glib types into its API. Why do I mention it? Because new DNF5 aims to
eradicate glib. Mostly to shrink container installations. librepo and
libmodulemd are the last pieces with glib. Because it's impossible to remove
glib from libmodulemd, there has to be a new library for parsing modular
metadata. If there has to be a new library, there could be a transition from
YAML to XML which would shrink the minimal installation more by removing
libyaml.

Original modulemd format possesses some features which nobody uses, or nobody
implements, or if implements, than not fully. Do you remember a deprecation of
intents from modularity
<https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/RXDP2WMPR3HHBRTQAKPSTRU6KABTJSMA/#RXDP2WMPR3HHBRTQAKPSTRU6KABTJSMA>?
There are more things that can be removed and make the format and its parser
simpler. 

Original format is not well specified. DNF and Satellite people complained
a lot when they were implementing it. The specification looks more like an
example. E.g. a module stream name is probably a string. An arbitrary string.
With spaces, with new lines. I think you do not want to see a stream named
" :\n". Well, DNF does not even allow you to identify a module like that.
There is definitely room for tightening the format. But each change like that
is technically an incompatible change. To materialze the change we need at
least a new modulemd format version. But if we need a new format version, we
can actually come a completely new format.

As you can see, there are good reasons to come up with a new in-repository
format. Hence here it is
<https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs>.

I originally developed the XML format to be able to encode all features we
have in the old YAML format. That's kept for your reference in "complete"
subdirectory
<https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs/complete>.

Then I removed all unnecessary features and put it into "reduced" subdirectory
<https://github.com/fedora-modularity/libmodulemd/tree/main/xml_specs/reduced>.

If you are interested in it, I recommend starting with overview.xml file. It
shows a skeleton of the format. It's so small I can quote it here:

<index xmlns="http://fedoraproject.org/metadata/moduleindex"; version="" 
revision="">
    <module name="">
        <stream name=""> <!-- DNF wants versions and contexts to differ in 
@summary etc. -->
            <build version="" context="" static="" arch="" summary="" 
description="">
                                    <!-- @static defaults to false. -->
                <dependency name="">
                    <requires></requires>   <!-- Only one for 
modulemd-packager-v3 -->
                    <conflicts></conflicts> <!-- Not supported by 
modulemd-packager-v3 -->
                </dependency>
                <dependency name=""/>   <!-- An unspecified stream.
                                             Not supported by 
modulemd-packager-v3. -->
                <license>
                    <module></module>
                    <content></content>
                </license>
                <references comunity="" documentation="" tracker=""/>
                <profile name="" description="">
                    <package></package>
                </profile>
                <api></api>
                <demodularized></demodularized>
                <nevra name="" epoch="" version="" release="" arch=""/>
            </build>

            <default-profile modified=""> <!-- @modified could be renamed to 
version -->
                <profile></profile> <!-- With a value replaces, missing unsets. 
-->
            </default-profile>

            <obsolete modified="" context=""> <!-- @modified in seconds since 
the epoch.
                        Missing or empty @context means all contexts. -->
                <eol when="" message=""> <!-- Missing element means unsetting. 
-->
                        <!-- @when in seconds since the epoch, missing means 
now. -->
                    <replacement module="" stream=""/>
                </eol>
            </obsolete>

            <translation modified=""> <!-- @modified could be renamed to 
version -->
                <locale name=""> <!-- Each of the child is optional, but there
                                      must be at least one. -->
                    <build summary="" description=""/>  <!-- missing @summary, 
@description unsets -->
                    <profile name="" description=""/>   <!-- missing 
@description unsets -->
                    <obsolete context="" message=""/>   <!-- missing or empty 
@context means
                            all contexts,
                            missing @message unsets, unsupported in YAML. -->
                </locale>
            </translation>
        </stream>

        <default-stream modified="" stream=""/> <!-- @modified could be renamed 
to version -->
                                        <!-- Existing @stream sets a default,
                                             missing or empty unsets. -->
    </module>

</index>

As you can see, there are no separate documents for modules and default
streams. Everything is kept inside one document. That enables
properties (e.g. obsoletes or default profiles) pertaining the same entity
(e.g. a stream) to be placed together. That prevents from repeating the
identifiers (e.g. stream names) and makes the format more succinct and easier
for querying. That's especially import for DNF which needs quickly to know
list of modules, streams of modules, to find out the latest build etc.

An example.xml file shows how a real data would look
<https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/example.xml>.
You can see e.g. see that time stamps are encoded as a number of seconds since
a Unix epoch. That will save DNF from parsing e-mail date notations, handling
time zones etc.

There is also a formal specification in a form or XML Schema
<https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/reduced/schema.xsd>.
And tests subdirectory with a preliminary sets of good and bad examples that
validates and fails a validation.

I'd be glad to hear any comments on the format.


A grand plan how to implement and deploy this format is outlined in
top-level README.md
<https://github.com/fedora-modularity/libmodulemd/blob/main/xml_specs/README.md>.
Basically it will be injected into createrepo_c tool to produce the XML data
in YUM repositories. Then the format will be consumed by DNF5. (Just to
clarify, currently missing support for modules in DNF5 is not caused by this
new XML format. DNF5 will support modules in the old YAML format soon through
libmodulemd library.) According to my consultation with DNF team, DNF5 plans
to prefer the XML format if both XML and YAML would exist in a repository.

-- Petr

Attachment: signature.asc
Description: PGP signature

_______________________________________________
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Do not reply to spam, report it: 
https://pagure.io/fedora-infrastructure/new_issue

Reply via email to