[
https://issues.apache.org/jira/browse/DAFFODIL-1808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16393673#comment-16393673
]
Michael Beckerle commented on DAFFODIL-1808:
--------------------------------------------
This ticket isn't about daffodil. It is about the JPEG Schema.
Ticket https://github.com/DFDLSchemas/JPEG/issues/2 replaces this ticket.
> JPEG schema accepts too many non-JPEG data files
> ------------------------------------------------
>
> Key: DAFFODIL-1808
> URL: https://issues.apache.org/jira/browse/DAFFODIL-1808
> Project: Daffodil
> Issue Type: New Feature
> Components: DFDL Schemas
> Reporter: Michael Beckerle
> Priority: Major
> Fix For: 2.2.0
>
>
> The JPEG DFDL schema has the problem of being much too permissive. Just blobs
> of binary data can often be accepted. The schema (to date) just identifies
> whether the file is any collection of JPEG segments. Alas one segment type is
> effectively just a datablob, so many datablobs will be accepted.
> To overcome this, additional constraint-checking is needed. This can be
> expressed using DFDL's dfdl:assert statements in the DFDL schema. There are
> two there already which enforce the first segment being a SOI segment (start
> of image), and the last being EOI (end of image); however, a blob of bytes
> between SOI and EOI would be accepted when it is clearly NOT a jpeg image.
> In some cases the constraint rules will require more expressive power than
> this - where true XPath query capability is required.
> The Schematron rule language could be used. See also DFDL-1807 - for
> schematron - in case it proves to be needed.
> Note that this is not "validation" of the data, it is using what we normally
> think of as a validation language, but using it for checking if the data is
> well-formed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)