Re: [PR] Draft: Refactoring to expose clean metadata and infoset walkers [daffodil]

via GitHub Thu, 09 Nov 2023 03:18:33 -0800


mbeckerle commented on code in PR #1112:
URL: https://github.com/apache/daffodil/pull/1112#discussion_r1387859396



##########
daffodil-japi/src/main/scala/org/apache/daffodil/japi/Daffodil.scala:
##########
@@ -514,6 +508,13 @@ class DataProcessor private[japi] (private var dp: 
SDataProcessor)
    */
   def save(output: WritableByteChannel): Unit = dp.save(output)
 
+  /**
+   * Walks the handler over the runtime metadata structures
+   *
+   * @param handler - the handler is called-back during the walk as each 
metadata structure is encountered.
+   */
+  def walkMetadata(handler: MetadataHandler) = dp.walkMetadata(handler)

Review Comment:
   Actually I thought of a better argument for why walking the runtime1 
metadata is better than walking the DSOM objects.
   
   You don't need the Schema source. It works on the pre-compiled binary schema 
just as well as if the schema was just compiled. This bypasses the need for 
Daffodil's schema compiler to be involved at all in interfacing to say, Apache 
Drill or other data fabrics. A pre-compiled DFDL schema is all that is needed.
   
   There's a further advantage. At runtime, when you are actually parsing data, 
the metadata attached to the infoset objects is the runtime1 metadata objects, 
and the fact that those are the exact same objects you walked if you walk the 
metadata first in a preparation/compilation step is helpful, because at runtime 
it is useful to sometimes use that metadata - one would be for 
built-in-self-checking - make sure the metadata at run time is the expected 
metadata, but the other is that at runtime sometimes you actually need metadata.
   
   The example I can think of where this is useful is if a DFDL schema has a 
choice, and two branches of the choice both contain an element named "message", 
but these are of different complex types. We actually have this in a number of 
large message format schemas. Now, the system we are interfacing with may not 
have the ability for different choice branches to have children with 
overlapping names like this. So it may need to call these message1, 
message2,.... and so on. At runtime when Daffodil dispatches a startComplex 
event call to the InfosetOutputter, we need to look at the actual runtime 
metadata (ERD) of the message element to determine which one it actually is, so 
we can choose to populate the correct one of message1, message2, etc.
   
   This is very much like the next-element resolver that is in the runtime1 
metadata. (Exposing that so it can be reused meaningfully in this case is the 
obvious next thought.)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Draft: Refactoring to expose clean metadata and infoset walkers [daffodil]

Reply via email to