Hans Brende created ANY23-417:
---------------------------------
Summary: Inherent problems with mimetype detection
Key: ANY23-417
URL: https://issues.apache.org/jira/browse/ANY23-417
Project: Apache Any23
Issue Type: Bug
Components: mime
Affects Versions: 2.3
Reporter: Hans Brende
Fix For: 2.3
N-Triples is a subset of Turtle, and it is also a subset of N-Quads. Turtle is
a subset of TriG.
But when we are performing mimetype detection on a plain text file, we only
sniff the first few kilobytes of data. Therefore, something we initially detect
as N-Triples may in fact be a Turtle, Trig, or NQuads document. Something we
initially detect as Turtle may in fact be a TriG document.
Therefore, if we detect that the document is Turtle, in the absence of a
declared Content-Type, we should probably assume that it actually TriG, just in
case.
If we can only detect that the document is N-Triples, that presents a problem,
because it could also be either Turtle or N-Quads. Which do we choose?
Another problem I see is that we are detecting both N3 and Turtle in two
separate steps. However, as I understand it, for the purposes of RDF, N3 is
essentially a synonym for Turtle. So it doesn't really make sense to use two
different detection steps for this. It appears that our N3 detection step is
actually detecting N-Triples, which is not at all the same thing.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)