[
https://issues.apache.org/jira/browse/DAFFODIL-2896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17886200#comment-17886200
]
Steve Lawrence commented on DAFFODIL-2896:
------------------------------------------
I'm in favor of deprecating the "full" behavior. I'm not sure we really need to
do double validation. And I think it actually makes testing harder because it's
not clear where your validation messages are coming from. If we are testing
validation, we probably want to explicitly test a message came from Daffodil or
Xeres and not just that something found a problem somewhere.
I wonder if we should also deprecate "limited" and change it to "daffodil"? I
feel like we always have to describe what the difference between "limited" and
"full" is. If they were called "daffodil" and "xerces" it would probably be
more clear.
Or maybe we should also avoid using specific library names so that we can
change them in the future? "daffodil" is probably fine instead of limited, but
maybe maybe "xerces" wants to be "xsd"?
Regarding pluggability, it would be really nice if we could just do
{code}
daffodil parse --validate xsd -s foo.dfdl.xsd
{code}
And SPI would look up and find the "xsd" validator. One issue with this is we
need to provide the main schema to Xerces. Right now we do that inside Daffodil
by hard coding things. If we wanted to go full SPI and not special case
anything we would need away to provide that to Xerces. Or require something
similar to what we do for schematron, e.g.
{code}
daffodil parse --validate xsd=foo.dfdl.xsd -s foo.dfdl.xsd
{code}
That's convenient if you want to validate with a separate schema than parse
(needed for things like stringAsXml), but it's kind of a pain for most other
cases since you have to duplicate the DFDL schema in the args. Maybe all
validation factories (even schematron) are passed in the root DFDL schema and
they can just choose to ignore it or not? That could actually be useful for
DFDL files with embedded schematron, right now schematron embedded in DFDL must
be done like
{code}
daffodil parse --validate schematron=foo.dfdl.xsd -s foo.dfdl.xsd
{code}
Maybe
{code}
daffodil parse --validation schematron -s foo.dfdlxsd
{code}
Defaults to assuming it's embedded schematron rules.
Another thought, if we really want support for "full", we could always extend
things to support a list of validators, for example, "--validate full" could
be done like:
{code}
daffodil parse --validate xsd --validate daffodil -s foo.dfdl.xsd
{code}
It's a bit verbose, but it's probably rare so probably fine. And I"m not sure
we'd ever need it, but the current CLI syntax could extend to that pretty
easily.
> validationMode=full enables Daffodils limited validation
> --------------------------------------------------------
>
> Key: DAFFODIL-2896
> URL: https://issues.apache.org/jira/browse/DAFFODIL-2896
> Project: Daffodil
> Issue Type: Bug
> Components: Back End
> Reporter: Steve Lawrence
> Assignee: Olabusayo Kilo
> Priority: Major
> Fix For: 3.7.0
>
>
> Daffodil has three validation modes: off, limited, and full. "Limited"
> enables Daffodils internal validation during parsing and "full" enables
> Xerces validation at the end of parsing. However, "full" also enables
> Daffodils limited validation, so we incur extra overhead with full validation.
> We should change full validation so it only does Xerces validation. Note that
> this change is not backwards compatible and could potentially break tests
> that use full validation but specifically look for Daffodil validation
> messages.
> Alternatively, we may want to consider adding a new validation mode that only
> enables xerces. Then "full" could keep its current behavior of validating
> with Daffodil and Xerces. This has the added benefit that we can get complete
> test coverage of both validation mechanisms without having to duplicate tests.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)