Package: python3-rdflib,python3-pyrdfa
Severity: normal
Hello,
I don't know which package is to blame for this issue, so I've assigned it to
two packages intentionally.
I'm on Debian Trixie and wanted to try 'rdfpipe' to read RDFa from a web page.
We have a downstream Debian-specific manual page rdfpipe(1) that says
OPTIONS
-i INPUT_FORMAT, --input-format=INPUT_FORMAT
Format of the input document(s). Available input formats are: ...,
rdfa, application/xhtml+xml, rdfa1.0, rdfa1.1, text/html, htmlThe date of the manual page says it's from 2013 and I see a lot has changed since then. However, it looks like support for RDFa in HTML is supposed to still work, although there's been substantial restructuring upstream and it looks like this may be offloaded to a plugin now. As a hint, one can try the following: $ rdfpipe https://johnscott.me/index.xhtml Traceback (most recent call last): File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 134, in get p: Plugin[PluginT] = _plugins[(name, kind)] ~~~~~~~~^^^^^^^^^^^^^^ KeyError: ('rdfa', <class 'rdflib.parser.Parser'>) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1497, in parse parser = plugin.get(format, Parser)() ~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 136, in get raise PluginException("No plugin registered for (%s, %s)" % (name, kind)) rdflib.plugin.PluginException: No plugin registered for (rdfa, <class 'rdflib.parser.Parser'>) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 134, in get p: Plugin[PluginT] = _plugins[(name, kind)] ~~~~~~~~^^^^^^^^^^^^^^ KeyError: ('rdfa', <class 'rdflib.parser.Parser'>) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/bin/rdfpipe", line 8, in <module> sys.exit(main()) ~~~~^^ File "/usr/lib/python3/dist-packages/rdflib/tools/rdfpipe.py", line 199, in main parse_and_serialize( ~~~~~~~~~~~~~~~~~~~^ args, opts.input_format, opts.guess, outfile, opts.output_format, ns_bindings ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ) ^ File "/usr/lib/python3/dist-packages/rdflib/tools/rdfpipe.py", line 53, in parse_and_serialize graph.parse(fpath, format=use_format, **kws) ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 2295, in parse context.parse(source, publicID=publicID, format=format, **args) ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/rdflib/graph.py", line 1507, in parse parser = plugin.get(format, Parser)() ~~~~~~~~~~^^^^^^^^^^^^^^^^ File "/usr/lib/python3/dist-packages/rdflib/plugin.py", line 136, in get raise PluginException("No plugin registered for (%s, %s)" % (name, kind)) rdflib.plugin.PluginException: No plugin registered for (rdfa, <class 'rdflib.parser.Parser'>) This is interesting though because rdfpipe was nevertheless smart enough to know that the 'xhtml' file extension meant it should parse as RDFa. The situation is confusing; I see that there was a time where RDFa support was split out into python3-pyrdfa, and then a plugin began to be provided by that package for python3-rdflib to invoke. https://github.com/RDFLib/pyrdfa3/issues/33#issuecomment-689465980 > [September 2020] Apologies for not testing compatibility with PyRdfa3 before > releasing RDFlib 5.0.0! https://github.com/RDFLib/rdflib/discussions/1582#discussioncomment-1879756 > [December 2021] Actually, splitting [RDFa] out as a separate plugin is fairly > trivial, given that RDFLib has a plugin interface. https://github.com/RDFLib/rdflib/commit/638a867168f05e2d3903f4a6e4ba9fa63807db6a > [October 2024] Replace html5lib with html5rdf, make it an optional dependency > Revert previous commit that made html support non-optional. > html support is now optional again, and it uses html5rdf rather than > html5lib/html5lib-modern. Maybe there's a reason why the plugin can't be discovered even when it's installed? Apparently there's build system magic that's supposed to help https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/ I also notice that python3-pyrdfa depends on python3-html5lib, but python3-rdflib build-depends and recommends the python3-html5rdf fork. Also see https://github.com/RDFLib/rdflib/issues/2099#issue-1352359511 > [August 2022, fixed February 2024] The html5lib library is required under > certain execution paths but is not included as a dependency in pyproject.toml. https://github.com/pangaea-data-publisher/fuji/issues/243 user experience report; it was said python3-rdflib version seven should have quirks sorted out. This is as far as my search has taken me, but I don't know what most of that means. Maybe those leads will get someone started addressing this -- System Information: Debian Release: 13.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'proposed-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 6.12.63+deb13-amd64 (SMP w/2 CPU threads; PREEMPT) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled
signature.asc
Description: This is a digitally signed message part

