mbeckerle commented on a change in pull request #478:
URL: https://github.com/apache/daffodil/pull/478#discussion_r588612770
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -385,8 +387,8 @@ final case class LocalDeclQName(prefix: Option[String],
local: String, namespace
/**
* QName for a global declaration or definition (of element, type, group,
format, etc.)
*/
-final case class GlobalQName(prefix: Option[String], local: String, namespace:
NS)
- extends NamedQName(prefix, local, namespace) {
+final case class GlobalQName(lexicalPrefix: Option[String], local: String,
namespace: NS)
Review comment:
This feels improperly named to me. When you define something that is
named, you never specify a prefix. The namespace is the target namespace of the
schema, and there is no prefix specified. The target namespace doesn't even
have to have *any* prefix defined. Other schemas that import/include this
schema *may* supply a prefix most likely they do, but they could define it to
be the default namespace and have no other means of access in a RefQName to one
of the items in this schema.
Only RefQNames have lexicalPrefix. GlobalQNames never have a prefix, or if
they do, it's because we're choosing one of the prefixes for the target
namespace that we prefer.
So perhaps this paremter to GlobalQName should be called preferredTnsPrefix ?
When we create a local element decl, the namespace is either no-namespace,
or the TNS depending on the form of the element (or elementFormDefault of the
schema, if the element has no form attribute, which is typical. I've never
actually seen the form attribute used, and I'm not sure we support the form
attribute. We do support elementFormDefault on the xs:schema however.) Again
there is no explicit prefix. So I think the constructor for NamedQName should
call it's prefix parameter preferredTnsPrefix.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out:
OutputStream, pretty: Boolean
override def startPrefixMapping(prefix: String, uri: String): Unit = {
val _prefix = if (prefix == "") null else prefix
- prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+ globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)
Review comment:
As we call this startPrefixMapping repeatedly, the globalPrefixMapping
var is no longer really the global one is it?
Or is this only called at the top level before we start handling elements?
I expected this to be named currentPrefixMapping.
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/XMLUtils.scala
##########
@@ -812,6 +812,8 @@ object XMLUtils {
actual: Node,
ignoreProcInstr: Boolean = true,
checkPrefixes: Boolean = false,
+ // checkNamespaces isn't order specific; it only checks that they have the
same (element-wise)
Review comment:
What does "element wise" namespaces mean? Maybe just say same set of
namespace bindings (unordered).
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -24,15 +24,37 @@ import scala.xml.NamespaceBinding
import org.apache.daffodil.infoset.XMLInfosetOutputter
import org.apache.daffodil.util.Indentable
+import org.apache.daffodil.util.MStackOf
import org.apache.daffodil.util.MStackOfBoolean
+import org.apache.daffodil.util.Maybe
+import org.apache.daffodil.util.Maybe.Nope
+import org.apache.daffodil.xml.XMLUtils
import org.xml.sax.Attributes
import org.xml.sax.ContentHandler
import org.xml.sax.Locator
class DaffodilParseOutputStreamContentHandler(out: OutputStream, pretty:
Boolean = false)
Review comment:
Whole class Scala doc please. This is a SAX content handler for when you
want to write textualized XML to a java output stream from SAX events created
by Daffodil's parser.
Question. Why does this have Daffodil in its name. Isn't this a generic
thing that woudl work with any source of SAX events? If that's the case, isn't
there one of these in Xerces, or Apache Commons XML or something?
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -425,8 +427,8 @@ final case class RefQName(prefix: Option[String], local:
String, namespace: NS)
}
}
- def toStepQName = StepQName(prefix, local, namespace)
- def toGlobalQName = GlobalQName(prefix, local, namespace)
+ def toStepQName = StepQName(lexicalPrefix, local, namespace)
+ def toGlobalQName = GlobalQName(lexicalPrefix, local, namespace)
Review comment:
Worth a coment here that for GlobalQNames, comparison of a RefQName to a
GlobalQName to see if they match ignores the prefix. The only use of this
lexicalPrefix parameter should be in diagnostic messages.
There's an argument that this should always be supplied as None, not the
lexicalPrefix. But most schemas don't play subtle namespace games, so that may
lead to diagnostic messages that are more verbose (using namespaces) when the
prefix would have been more meaningful to the user. So I think keep the
lexicalPrefix here anyway.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out:
OutputStream, pretty: Boolean
override def startPrefixMapping(prefix: String, uri: String): Unit = {
val _prefix = if (prefix == "") null else prefix
- prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+ globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)
+ currentElementPrefixMapping = NamespaceBinding(_prefix, uri,
currentElementPrefixMapping)
}
override def endPrefixMapping(prefix: String): Unit = {
// do nothing
}
+ /**
+ * Uses Attributes, which is passed in to the startElement callback, to
extract prefix mappings
+ * and other attributes and return a tuple of an updated globalPrefixMapping
and a Seq of attribute=val
+ * String pairings
+ */
+ def processAttributes(atts: Attributes, prefixMapping: NamespaceBinding):
(NamespaceBinding, Seq[String]) = {
+ var pm = prefixMapping
+ var attrPairings: Seq[String] = Seq()
+ var i = 0
+ var mappingsList: Seq[(String, String)] = Seq()
+ while (i < atts.getLength) {
+ val qName = atts.getQName(i)
+ val attrVal = atts.getValue(i)
+ if (qName.nonEmpty) { //if qName is populated; as in when
namespacePrefixes == true
+ if (qName.startsWith("xmlns:") || qName == "xmlns") { // describing
namespace mapping
+ // get prefix
+ val pre = if (qName.startsWith("xmlns:")) {
+ qName.substring(6)
+ } else {
+ null
+ }
+ // we make this call to check if the prefix already exists. If it
doesn't exist, we get the
+ // npe so we can add it, but if it does exist, nothing happens and
it doesn't get re-added
+ // and we instead proceed to the next item in Attributes
+ val prefixUri = XMLUtils.getURIOption(pm, pre)
+ prefixUri match {
+ case Some(_) => // already exists; do nothing
+ case None => // add it
+ mappingsList +:= (pre, attrVal)
+ }
+ } else {
+ // regular attribute
+ attrPairings +:= s" $qName=" + "\"" + s"${attrVal}" + "\""
Review comment:
Can this happen? DFDL doesn't have attributes. Coverage says this isn't
tested. Can this even happen?
Maybe this is Assert.invariantFailed("DFDL Infoset XML shouldn't have
attributes.")
Maybe we need an example with both ns bindings on an element and
xsi:type="xs:string" or xsi:nil="true" so that this will be covered?
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetOutputter.scala
##########
@@ -144,29 +149,25 @@ class SAXInfosetOutputter(xmlReader:
DFDL.DaffodilParseXMLReader)
}
private def doStartPrefixMapping(diElem: DIElement, contentHandler:
ContentHandler): Unit = {
- val nsbStart = diElem.erd.minimizedScope
- val nsbEnd = if (diElem.isRoot) {
- scala.xml.TopScope
- } else {
- diElem.diParent.erd.minimizedScope
- }
+ val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) =
getNsbStartAndEnd(diElem)
var n = nsbStart
+ var mappingsList: Seq[(String, String)] = Seq()
while (n != nsbEnd && n != null && n != scala.xml.TopScope) {
val prefix = if (n.prefix == null) "" else n.prefix
val uri = if (n.uri == null) "" else n.uri
- contentHandler.startPrefixMapping(prefix, uri)
+ // we generate a list here by prepending so can build the prefixMapping
in the
+ // same order as it is in minimizedScope when we call
startPrefixMapping; we do this because
+ // getPrefix in the contentHandler is order dependent
+ mappingsList +:= (prefix, uri)
n = n.parent
}
+ mappingsList.foreach{ case (prefix, uri) =>
+ contentHandler.startPrefixMapping(prefix, uri)
+ }
}
private def doEndPrefixMapping(diElem: DIElement, contentHandler:
ContentHandler): Unit = {
- val nsbStart = diElem.erd.minimizedScope
- val nsbEnd = if (diElem.isRoot) {
- scala.xml.TopScope
- } else {
- diElem.diParent.erd
- .minimizedScope
- }
+ val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) =
getNsbStartAndEnd(diElem)
var n = nsbStart
Review comment:
I think this code is depending on iterating down nsbStart's bindings
eventually gets you to nsbEnd. I.e., nsbEnd is the tail of the nsbStart chain.
I am not sure what == and != mean on NamespaceBinding objects. But they
might be doing elaborate comparisons to see if they define equivalent scopes
(all same bindings, just different order).
I think your while could be while ( (n ne nsbEnd && .... i.e., using pointer
equality here. This may actually be more correct than some sort of
binding-scope equality, given what your algorthm is trying to achieve here
which is to one by one remove the bindings that were added to the front.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/infoset/SAXInfosetOutputter.scala
##########
@@ -175,26 +176,78 @@ class SAXInfosetOutputter(xmlReader:
DFDL.DaffodilParseXMLReader)
}
}
- private def doStartElement(diElem: DIElement, contentHandler:
ContentHandler): Unit = {
- val (ns: String, elemName: String, qName: String) =
getNameSpaceElemNameAndQName(diElem)
- doStartPrefixMapping(diElem, contentHandler)
+ /**
+ * Add the prefixes and uris from the element's NamespaceBinding to
Attributes,
+ * when namespacePrefixes feature is true
+ */
+ private def doAttributesPrefixMapping(diElem: DIElement, attrs:
AttributesImpl): AttributesImpl = {
+ val (nsbStart: NamespaceBinding, nsbEnd: NamespaceBinding) =
getNsbStartAndEnd(diElem)
+ var n = nsbStart
+ while (n != nsbEnd && n != null && n != scala.xml.TopScope) {
+ val prefix = if (n.prefix == null) "xmlns" else s"xmlns:${n.prefix}"
+ val uri = if (n.uri == null) "" else n.uri
+ // uri and localname are always empty for NamespaceBinding attributes
+ attrs.addAttribute("", "", prefix, "CDATA", uri)
+ n = n.parent
+ }
+ attrs
+ }
- val attrs = if (isNilled(diElem)) {
- createNilAttribute()
+ private def getNsbStartAndEnd(diElem: DIElement) = {
Review comment:
Is there any requirement about sharing between nsbStart and nsbEnd here?
Seems to me we're depending on these being somehow one an extension of the
other. Is that the case?
then we should state that in a comment, and maybe check it as an
Assert.invariant.
Alternatively, where minimized scope is computed, the invariant that each
element's minimized scope is an extension of its parents minimized scope could
be checked there. (maybe it is checked there?)
I think your code is depending on this for correctness, so just let's be
explicit about this sharing.
What I'm wondering about is if you compare two NamespaceBinding objects with
==, does it check for "equivalence" of the namespace bindings? I.e, are they
the same list, then true else sort both sets of bindings, etc. etc.
Algorrithmically here, I think you are treating NamespaceBindings as linked
chains, and when comparing them I think you basically always want eq or ne, not
== or !=.
The Scala XML library really made a big mistake with these things. Namespace
bindings should have been an ordinary List of Binding objects where a binding
is just a single prefix + NS pair. So many idioms would have worked out better
that way. Ordinary mapping, finding, etc. Instead we have these collection like
things that are not standard scala collections.
##########
File path: daffodil-lib/src/main/scala/org/apache/daffodil/xml/QNameBase.scala
##########
@@ -224,8 +224,7 @@ trait QNameBase extends Serializable {
* must show up in diagnostic messages. Mistakes by having the wrong prefix
* or by omitting one, are very common.
*/
- def prefix: Option[String]
- def prefixOrNull: String = prefix.orNull
+ def lexicalPrefix: Option[String]
Review comment:
I think your scaladoc isn't complete here.
I think you mean by lexical prefix, the prefix that was taken off of a QName
reference to a named thing.
A QName is a:b where Some("a") is the prefix, or it's just b, in which case
None is the lexicalPrefix.
One more ambiguity to clarify.
If a schema defines several prefixes for a namespace, which will be the
lexicalPrefix?
I've seen lots of things like
```
targetNamespace="http://pcap.com"
xmlns:pcap="http://pcap.com"
xmlns:tns="http://pcap.com"
xmlns="http://pcap.com"
```
So the default namespace is the same as tns is the same as pcap. So a QName
reference given this default such as
```
<xs:element ref="foobar"/>
```
will be to the pcap.com URL, but what will be the lexical prefix in this
case? None, or Some("pcap") or Some("tns"), or undetermined, but one of those?
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseXMLReader.scala
##########
@@ -43,35 +42,28 @@ class DaffodilParseXMLReader(dp: DataProcessor) extends
DFDL.DaffodilParseXMLRea
private var errorHandler: ErrorHandler = _
private var dtdHandler: DTDHandler = _
private var entityResolver: EntityResolver = _
- var saxParseResultPropertyValue: ParseResult = _
- var saxBlobDirectoryPropertyValue: Path =
Paths.get(System.getProperty("java.io.tmpdir"))
- var saxBlobPrefixPropertyValue: String = "daffodil-sax-"
- var saxBlobSuffixPropertyValue: String = ".blob"
-
- private val featureMap = mutable.Map[String, Boolean](
- XMLUtils.SAX_NAMESPACES_FEATURE -> false,
- XMLUtils.SAX_NAMESPACE_PREFIXES_FEATURE -> false
- )
+ private var saxParseResultPropertyValue: ParseResult = _
Review comment:
Class needs scaladoc. What is a DaffodilParseXMLReader? It reads data,
creates SAX events from it, and calls the supplied content handler with the
events yes?
I am somehow now thinking this class is used only for testing, because it
constructs and uses (when parse is called) a SAXInfosetOutputter which we use
to test unparsing. Is that right?
I'm sort of losing track of what is test rig and what is core SAX algorithm.
If this used for testing only, can it live in src/test ?
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -66,13 +88,74 @@ class DaffodilParseOutputStreamContentHandler(out:
OutputStream, pretty: Boolean
override def startPrefixMapping(prefix: String, uri: String): Unit = {
val _prefix = if (prefix == "") null else prefix
- prefixMapping = NamespaceBinding(_prefix, uri, prefixMapping)
+ globalPrefixMapping = NamespaceBinding(_prefix, uri, globalPrefixMapping)
+ currentElementPrefixMapping = NamespaceBinding(_prefix, uri,
currentElementPrefixMapping)
}
override def endPrefixMapping(prefix: String): Unit = {
// do nothing
}
+ /**
+ * Uses Attributes, which is passed in to the startElement callback, to
extract prefix mappings
+ * and other attributes and return a tuple of an updated globalPrefixMapping
and a Seq of attribute=val
+ * String pairings
+ */
+ def processAttributes(atts: Attributes, prefixMapping: NamespaceBinding):
(NamespaceBinding, Seq[String]) = {
+ var pm = prefixMapping
+ var attrPairings: Seq[String] = Seq()
+ var i = 0
+ var mappingsList: Seq[(String, String)] = Seq()
+ while (i < atts.getLength) {
+ val qName = atts.getQName(i)
+ val attrVal = atts.getValue(i)
+ if (qName.nonEmpty) { //if qName is populated; as in when
namespacePrefixes == true
+ if (qName.startsWith("xmlns:") || qName == "xmlns") { // describing
namespace mapping
+ // get prefix
+ val pre = if (qName.startsWith("xmlns:")) {
+ qName.substring(6)
+ } else {
+ null
+ }
+ // we make this call to check if the prefix already exists. If it
doesn't exist, we get the
+ // npe so we can add it, but if it does exist, nothing happens and
it doesn't get re-added
Review comment:
Type npe here does not mean null pointer exception it is supposed to be
None I think.
Unless this really does throw an NPE which is a really bad API, and we
should change it.
##########
File path:
daffodil-runtime1/src/main/scala/org/apache/daffodil/processors/DaffodilParseOutputStreamContentHandler.scala
##########
@@ -116,9 +247,16 @@ class DaffodilParseOutputStreamContentHandler(out:
OutputStream, pretty: Boolean
}
}
writer.write("</")
- writer.write(qName)
+ outputTagName(uri, localName, qName)
writer.write(">")
outputNewlineStack.pop()
+
+ if (!globalPrefixMappingContextStack.isEmpty)
globalPrefixMappingContextStack.pop
+ if (globalPrefixMappingContextStack.isEmpty) {
+ globalPrefixMapping = null
+ } else {
+ globalPrefixMapping = globalPrefixMappingContextStack.top
+ }
Review comment:
Ok. I didn't see a single thing in that class that has anything to do
with DFDL/Daffodil. It's a generic SAX even to text content handler. Can't we
just use one and not have to maintain this code?
If not, what did I miss?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]