paulmillar opened a new issue, #1663:
URL: https://github.com/apache/jena/issues/1663

   ### Version
   
   4.6.0
   
   ### Feature
   
   I don't think this is a bug per se , but (seemingly) I've hit a limitation 
on what `riot` can do.
   
   Since v1.4, PDF has supported embedding an RDF graph as metadata.  This has 
been standardised as 
[XMP](https://en.wikipedia.org/wiki/Extensible_Metadata_Platform).  
   
   I believe that, by convention, XMP uses the empty IRI to indicate that the 
subject of triples is the PDF file itself. [The Wikipedia 
example](https://en.wikipedia.org/wiki/Extensible_Metadata_Platform#Example) 
suggests this; however, I haven't verified this by checking the XMP 
specification.
   
   I wrote some simple metadata in Turtle to illustrate the problem/limitation:
   
   ```turtle
   @prefix dc:  <http://purl.org/dc/elements/1.1/> .
   @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
   
   <>
       dc:description  "An example that demonstrates a problem."@en;
       dc:title        "An example title"@en;
       dc:creator      "Jane Doe";
       dc:date         "2022-12-04";
       dc:language     "en-GB";
       .
   ```
   
   I am able to use the `riot` command to convert this Turtle data into a 
corresponding RDF/XML file, as needed by XMP.
   ```console
   paul@sprocket:~/Riot problem$ riot --formatted=RDF/XML  example.ttl
   <rdf:RDF
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
       xmlns:dc="http://purl.org/dc/elements/1.1/";>
     <rdf:Description rdf:about="file:///home/paul/Riot%20problem/example.ttl">
       <dc:language>en-GB</dc:language>
       <dc:date>2022-12-04</dc:date>
       <dc:creator>Jane Doe</dc:creator>
       <dc:title xml:lang="en">An example title</dc:title>
       <dc:description xml:lang="en">An example that demonstrates a 
problem.</dc:description>
     </rdf:Description>
   </rdf:RDF>
   paul@sprocket:~/Riot problem$ 
   ```
   
   The problem here is that `riot` "helpfully" expands the empty IRI into a 
corresponding `file:` IRI.  Note that the `rdf:Description` element contains 
the `rdf:about` attribute with a value 
`file:///home/paul/Riot%20problem/example.ttl`.
   
   This is a problem because 1. the resource is the Turtle file rather than the 
PDF file, 2. IRIs are absolute and the PDF file may be renamed or copied onto a 
different system.
   
   I was hoping for `riot` to generate the following XML:
   
   ```xml
   <rdf:RDF
       xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#";
       xmlns:dc="http://purl.org/dc/elements/1.1/";>
     <rdf:Description rdf:about="">
       <dc:language>en-GB</dc:language>
       <dc:date>2022-12-04</dc:date>
       <dc:creator>Jane Doe</dc:creator>
       <dc:title xml:lang="en">An example title</dc:title>
       <dc:description xml:lang="en">An example that demonstrates a 
problem.</dc:description>
     </rdf:Description>
   </rdf:RDF>
   ```
   
   As far as I'm aware, the output from `riot` is correct, as the empty IRI is 
equivalent to the expanded resource (again, I haven't checked this with RDF 
spec.). Therefore, I wouldn't classify this as a bug.
   
   However, the output isn't what I need and I haven't found an option to 
`riot` to get the desired output; i.e., with `rdf:about=""`.
   
   A simple solution might be to add an option that suppresses riot/Jena's 
ability to expand an empty IRI.  A more sophisticated solution would identify 
IRIs that are the input file itself and replace them with the empty IRI.
   
   Just as a side-node: embedding the above RDF/XML infoset under a `<x:xmpmeta 
xmlns:x="adobe:ns:meta/">` element allows 
[`podofoxmp`](https://github.com/podofo/podofo) to create a new PDF file that 
includes the desired RDF graph.
   
   ### Are you interested in contributing a solution yourself?
   
   None


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to