Package: python3-rdflib,python3-pyrdfa
Severity: normal

Hello,
I don't know which package is to blame for this issue, so I've assigned it to 
two packages intentionally.
I'm on Debian Trixie and wanted to try 'rdfpipe' to read RDFa from a web page. 
We have a downstream Debian-specific manual page rdfpipe(1) that says
OPTIONS
        -i INPUT_FORMAT, --input-format=INPUT_FORMAT
        Format of the input document(s). Available input formats are: ..., 
rdfa, application/xhtml+xml, rdfa1.0, rdfa1.1, text/html, html

The date of the manual page says it's from 2013 and I see a lot has changed 
since then. However, it looks like support for RDFa in HTML is supposed to 
still work, although there's been substantial restructuring upstream and it 
looks like this may be offloaded to a plugin now. As a hint, one can try the 
following:
$ rdfpipe https://johnscott.me/index.xhtml
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 134, in get
    p: Plugin[PluginT] = _plugins[(name, kind)]
                         ~~~~~~~~^^^^^^^^^^^^^^
KeyError: ('rdfa', <class 'rdflib.parser.Parser'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1497, in parse
    parser = plugin.get(format, Parser)()
             ~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 136, in get
    raise PluginException("No plugin registered for (%s, %s)" % (name, kind))
rdflib.plugin.PluginException: No plugin registered for (rdfa, <class 
'rdflib.parser.Parser'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 134, in get
    p: Plugin[PluginT] = _plugins[(name, kind)]
                         ~~~~~~~~^^^^^^^^^^^^^^
KeyError: ('rdfa', <class 'rdflib.parser.Parser'>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/bin/rdfpipe", line 8, in <module>
    sys.exit(main())
             ~~~~^^
  File "/usr/lib/python3/dist-packages/rdflib/tools/rdfpipe.py", line 199, in 
main
    parse_and_serialize(
    ~~~~~~~~~~~~~~~~~~~^
        args, opts.input_format, opts.guess, outfile, opts.output_format, 
ns_bindings
        
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/usr/lib/python3/dist-packages/rdflib/tools/rdfpipe.py", line 53, in 
parse_and_serialize
    graph.parse(fpath, format=use_format, **kws)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 2295, in parse
    context.parse(source, publicID=publicID, format=format, **args)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1507, in parse
    parser = plugin.get(format, Parser)()
             ~~~~~~~~~~^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 136, in get
    raise PluginException("No plugin registered for (%s, %s)" % (name, kind))
rdflib.plugin.PluginException: No plugin registered for (rdfa, <class 
'rdflib.parser.Parser'>)


This is interesting though because rdfpipe was nevertheless smart enough to 
know that the 'xhtml' file extension meant it should parse as RDFa.

The situation is confusing; I see that there was a time where RDFa support was 
split out into python3-pyrdfa, and then a plugin began to be provided by that 
package for python3-rdflib to invoke.
https://github.com/RDFLib/pyrdfa3/issues/33#issuecomment-689465980
> [September 2020] Apologies for not testing compatibility with PyRdfa3 before 
> releasing RDFlib 5.0.0!
https://github.com/RDFLib/rdflib/discussions/1582#discussioncomment-1879756
> [December 2021] Actually, splitting [RDFa] out as a separate plugin is fairly 
> trivial, given that RDFLib has a plugin interface.
https://github.com/RDFLib/rdflib/commit/638a867168f05e2d3903f4a6e4ba9fa63807db6a
> [October 2024] Replace html5lib with html5rdf, make it an optional dependency
>       Revert previous commit that made html support non-optional.
>       html support is now optional again, and it uses html5rdf rather than 
> html5lib/html5lib-modern.


Maybe there's a reason why the plugin can't be discovered even when it's 
installed? Apparently there's build system magic that's supposed to help 
https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/
I also notice that python3-pyrdfa depends on python3-html5lib, but 
python3-rdflib build-depends and recommends the python3-html5rdf fork. 

Also see https://github.com/RDFLib/rdflib/issues/2099#issue-1352359511
> [August 2022, fixed February 2024] The html5lib library is required under 
> certain execution paths but is not included as a dependency in pyproject.toml.

https://github.com/pangaea-data-publisher/fuji/issues/243 user experience 
report; it was said python3-rdflib version seven should have quirks sorted out. 

This is as far as my search has taken me, but I don't know what most of that 
means. Maybe those leads will get someone started addressing this

-- System Information:
Debian Release: 13.3
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 
'proposed-updates'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.12.63+deb13-amd64 (SMP w/2 CPU threads; PREEMPT)
Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to