Re: The future of the daffodil DFDL schema debugger?

Adam Rosien Mon, 24 May 2021 09:56:56 -0700

Your message is extremely helpful! I'll spend some time working through it
and follow up.


On Mon, May 24, 2021 at 9:48 AM Beckerle, Mike <
mbecke...@owlcyberdefense.com> wrote:

> Some thoughts re: data format debugger
>
> I suggest we enumerate
>
>   *   every single piece of state of the parser,
>   *   every single piece of state of the unparser,
>   *   each action/step of the parser,  (every parse combinator or
> primitive, their subactions)
>   *   and of the unparser, (every unparse combinator, primitive,
> suspension,...)
>
> and wire-frame/mock-up some display for each piece of state, and how, if
> changed by a step, the change to that piece of state would be displayed.
>
> We can write down the nuances associated with these data items/actions
> that impact debugger display.
>
> Some of these states/actions will be analogous to things in conventional
> debuggers. (e.g., looking at the values of variables) Others will be
> specific to DFDL needs. (e.g., looking at layers in the data stream,
> visualizing delimiter scanning success/failure, backtracking)
>
> Core concepts a debugger needs are framing vs. content vs. value, and the
> "regions" in the data stream that make these up. The framing includes
> initiators, terminators, separators, alignment regions, prefix-length
> regions, leading/trailing skip regions, unused regions. Those surround the
> content region, and when padding/filling is involved (for simple types that
> are textual) the content region contains leading pad and trailing pad
> regions, surrounding the value region.
>
> An example of graphical nested box representation of these regions is here
> in a design note about Daffodil:
>
>
> https://daffodil.apache.org/dev/design-notes/term-sharing-in-schema-compiler/
> (see section "Details of Unique and Shared Regions")
>
> The way to start this effort is to look at the UState and PState classes.
> These are the state blocks. Every piece of these is potentially important
> to the debugger.
>
> Lastly, an important aspect of Daffodil is the streaming behavior of the
> parser and unparser. While I believe it is more important to get something
> working than for it to cover every feature, this is an area where not
> anticipating how it needs to work is likely to lock one out of a future
> scenario that accomodates it.
>
> So the parser doesn't produce an infoset. It  produces a stream of infoset
> events, or call-backs to be exact.
> Due to backtracking in the parser, these events can be hung-up for
> substantial time while the parser continues. So we can't assume that there
> is any sort of correlation between parser activity and the producing of
> events.
>
> The unparser doesn't consume an infoset, It consumes a stream of infoset
> events. Specifically, the unparser is the callback-handler for unparse
> infoset events.
>
> The infoset gets trimmed so that we needn't build up the complete infoset
> tree in memory. As parse-events are produced, no-longer necessary parts of
> the infoset are pruned away. Similarly, when unparsing, once a part of the
> infoset has been unparsed, that part of the infoset tree is pruned away if
> no longer needed.
>
>
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Thursday, April 22, 2021 9:32 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> Some thoughts related to showing the infoset as if it were a variable as
> this is prototyped
>
> 1) How do DAP/IDE's represent very large hierarchical data? Infosets can
> be huge, and most of the time a user only cares about the most recent
> infoset item. So someway to follow and show just the most recent part of
> the infoset is important. The current Daffodil debugger as an
> "infosetLines" setting so that it only shows the most recent X number of
> lines, which is most all a user cares about when stepping through a parse.
>
> 2) Infoset items are added and removed very frequently during a parse.
> Currently, when the Daffodil debugger shows the infoset it just converts
> the entire thing to XML and displays that. This doesn't work at all for
> large infosets since this can take a long time. I was hoping this issue
> would get resolved with this new debugging infrastructure. When the
> infoset is modified, we ideally want a way to specify via DAP that parts
> of the variable hierarchy were added/removed rather than having to send
> the entire infoset during every variable update.
>
> 3) I can imagine a feature where a user would want to select an infoset
> item and jump to the associated schema element, or query information
> about that infoset item (e.g.. what bit position did it start at, what
> was the length). We don't have this right now, but would be really nice
> to have. This suggests that we need metadata associated with each of the
> variables. Does DAP have a concept of that and do IDE's have a way to
> show it?
>
> On 4/21/21 7:52 PM, Adam Rosien wrote:
> > I've been reading up on DAP and wanted to share...
> >
> >> There are many areas though that are unique to Daffodil that have no
> > representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> > different variable types, backtracking, etc) will need an extension to
> > DAP.  This really boils down to defining these things to fit under the
> DAP
> > BaseProtocol and enabling handling of those objects on both the front and
> > back ends.
> >
> > To me, much of the current state exposed by the (Daffodil) Debugger
> > translates directly to a DAP Variable[1]. DAP Variables can be
> > nested/hierarchical, so they could (potentially) model larger data like
> the
> > infoset. I can imagine shoving all the current state into Variables as a
> > proof-of-concept.
> >
> > It also seems like the processing stack maintained by the Daffodil
> PState,
> > where each item references the relevant schema element, could translate
> to
> > the DAP StackFrame type [2]. That is, the path from the schema root to
> the
> > currently processing schema element becomes the "call stack". (Apologies
> if
> > I don't have all the Daffodil terms lined up correctly.)
> >
> > For displaying the input data and processing progress, I looked at a few
> > existing VS Code extensions that provided non-builtin views, some of
> which
> > interact with their DAP debugger code [3] [4] [5] [6].
> >
> > Finally, I took a cursory look at scala-debug-adapter [7], which, for
> > reference, wraps Microsoft's java-debug implementation of DAP. I was
> > curious about the set of request/response and event types. Additionally,
> > the Typescript API to VS Code offers custom DAP requests and responses,
> but
> > I couldn't find the equivalent notion in the java-debug project.
> >
> > .. Adam
> >
> > [1]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_Variable
> > [2]
> >
> https://microsoft.github.io/debug-adapter-protocol/specification#Types_StackFrame
> > [3] https://github.com/scalameta/metals-vscode (provides a debugger and
> > non-debugger custom UI)
> > [4] https://github.com/microsoft/vscode-cpptools (debugger + memory
> view)
> > [5]
> https://marketplace.visualstudio.com/items?itemName=marus25.cortex-debug
> > (debugger + memory view,
> >
> https://github.com/Marus/cortex-debug/blob/master/src/frontend/memory_content_provider.ts
> > )
> > [6]
> >
> https://marketplace.visualstudio.com/items?itemName=slevesque.vscode-hexdump
> > (extension for hexdumps that could be controlled by other extensions)
> > [7] https://github.com/scalacenter/scala-debug-adapter
> > [8] https://github.com/microsoft/java-debug
> >
> > On Tue, Apr 20, 2021 at 7:08 AM John Wass <jwa...@gmail.com> wrote:
> >
> >>> Going to look deeper into how DAP might fit with Daffodil
> >>
> >> Have been looking over DAP and getting a good feeling about it. The
> >> specification [1] seems general enough that it could be applied to
> Daffodil
> >> and cover a swath of common operations (like start, stop, break,
> continue,
> >> code locations, variables, etc).
> >>
> >> There are many areas though that are unique to Daffodil that have no
> >> representation in the spec.  These things (like InputStream, Infoset,
> PoU,
> >> different variable types, backtracking, etc) will need an extension to
> >> DAP.  This really boils down to defining these things to fit under the
> DAP
> >> BaseProtocol and enabling handling of those objects on both the front
> and
> >> back ends.
> >>
> >> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> >> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> >> would bring a lot of extra baggage to work around that.  Developing a
> >> Daffodil specific implementation is no small task, but feasible.  There
> are
> >> a several existing implementations on the JVM that are close and can be
> >> looked at for reference.
> >>
> >> The backend implementation would look similar to what was described in
> an
> >> earlier post.  We could use ZIO/Akka/etc to implement the backend
> Protocol
> >> Server to enable the IO between the Daffodil process and the DAP
> clients.
> >> This implementation would now be guided by the DAP specification.
> >>
> >> With the protocol and backend extended to fit Daffodil that leaves the
> >> frontend.  In theory an existing IDE plugin should get pretty close to
> >> being able to perform the common debug operations mentioned above.  To
> >> support the Daffodil extensions there will need to be handling of the
> >> extended protocol into whatever views are desired/applicable.
> >>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>
> >> JDI appears to be the wrong level of abstraction for what we are talking
> >> about in debugging Daffodil for schema development.  While DAP does do
> JVM
> >> debugging (through a JDI DAP impl) it also generalizes to many other
> >> debugging scenarios.  JDI on the other hand is very tied to the JVM.
> >>
> >> Extending the JDI appears to be more complex than dealing with DAP, and
> >> even though the JDI API is mostly defined with interfaces, there are
> choke
> >> points that limit to JVM concepts.  For example jdi.Value has a finite
> set
> >> of JVM types that it works with, its not clear where Daffodil types
> would
> >> plugin if even possible.
> >>
> >> The final note is that unique Daffodil features wouldn’t get to IDE
> support
> >> any faster JDI.  In some cases, like VS Code, you would still need an
> >> extended DAP to support these features.
> >>
> >>> and depending on how it shakes out will update the example to show
> >> integration
> >>
> >> It would appear wise to investigate DAP further.  Next step is to refine
> >> these thoughts with a prototype. I started an implementation in the
> example
> >> debugger project [4] to try to run the current example on a _minimal_
> DAP
> >> implementation.
> >>
> >>
> >> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> >> [2] https://github.com/Microsoft/java-debug
> >> [3] https://github.com/scalacenter/scala-debug-adapter
> >> [4] https://github.com/jw3/example-daffodil-debug
> >>
> >>
> >> On Mon, Apr 12, 2021 at 9:58 AM John Wass <jwa...@gmail.com> wrote:
> >>
> >>>> the code is here https://github.com/jw3/example-daffodil-debug
> >>>
> >>> There is now a complete console based example for Zio that demonstrates
> >>> controlling the debug flow while distributing the current state to
> three
> >>> "displays".
> >>> 1. infoset at current step
> >>> 2. diff of infoset against previous step
> >>> 3. bit position and value of data.
> >>>
> >>> These displays are very rudimentary but demonstrate the ability to
> >>> asynchronously populate multiple views while synchronously controlling
> >> the
> >>> debug loop.
> >>>
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>
> >>> Going to look deeper into how DAP might fit with Daffodil, and
> depending
> >>> on how it shakes out will update the example to show integration.
> >>>
> >>> Some interesting links to start with
> >>> - https://github.com/scalacenter/scala-debug-adapter
> >>> -
> >>>
> >>
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> >>> - https://github.com/microsoft/java-debug
> >>>
> >>> Also looking into the Java Debug Interface (JDI) for comparison.
> >>>
> >>>
> >>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jwa...@gmail.com> wrote:
> >>>
> >>>> Revisiting this post after doing some debugger related work and
> thinking
> >>>> about debug protocol/adapters to connect external tooling to the debug
> >>>> process.
> >>>>
> >>>> This comment is good
> >>>>
> >>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>> essentially JSON message based) but could be targeted to the things
> >> that a
> >>>> DFDL schema debugger would really need. An added benefit with some
> >> sort of
> >>>> protocol is the debugger interface can be uncoupled from Daffodil
> >>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
> >>>> framework and just have it communicate the protocol over some form of
> >>>> IPC. Another benefit is that any future backends could implement this
> >>>> protocol and so a single debugger could hook into different backends
> >>>> without much issue. Unfortunately, defining such a protocol might be a
> >>>> large task, but we do have our existing debug infrastructure and
> things
> >>>> like DAP to guide its development/design.
> >>>>
> >>>> Some thoughts on this
> >>>> - Defining the protocol will be a large task, but a minimal version
> >>>> should get up and round tripping quickly with a minimal subset of the
> >>>> protocol.
> >>>> - The new protocol being informed by existing debugger and DAPis key
> >>>> - Uncoupling from Daffodil is key
> >>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not
> to
> >>>> constrain Daffodil debugging capability
> >>>> - We dont need to tie the protocol or adapters to a single framework,
> >>>> implementations of the IO layer should be simple enough to support
> >> multiple
> >>>> things (eg Akka, Zio, "basic" ...)
> >>>> - The current debugger lives in runtime1, but can we make an abstract
> >> API
> >>>> that any runtime would implement?
> >>>>
> >>>> Maybe a solution is structured like this
> >>>> - daffodil-debug-api:
> >>>>   - protocol model
> >>>>   - interfaces: debugger / IO adapter / etc
> >>>>   - lives in daffodil repo (new subproject?)
> >>>> - daffodil-debug-io-NAME
> >>>>   - provides implementation of a specific IO adapter
> >>>>   - multiple projects possible (daffodil-debugger-akka,
> >>>> daffodil-debugger-zio, etc)
> >>>>   - supported ones live in their own subprojects, but other can be
> >>>> plugged in from external sources
> >>>>   - ability to support multiple implementations reduces risk of
> lock-in
> >>>> - debugger applications
> >>>>   - maintained in external repositories
> >>>>   - depending on the IO implementation these could execute be in
> >> separate
> >>>> process or on separate machine
> >>>>   - like Steve said, could be any language / framework
> >>>>
> >>>> Three types of reference implementations / sample applications could
> >> also
> >>>> guide the development of the API
> >>>>   1. a replacement for the existing TUI debugger, expected to end up
> >> with
> >>>> at minimum the same functionality as the current one.
> >>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >>>>   3. an IDE integration
> >>>>
> >>>> Thoughts?
> >>>>
> >>>> Also I'm working on some reference implementations of these concepts
> >>>> using Akka and Zio.  Not quite ready to talk through it yet, but the
> >> code
> >>>> is here https://github.com/jw3/example-daffodil-debug
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <slawre...@apache.org>
> >>>> wrote:
> >>>>
> >>>>> Yep, something like that seems very reasonable for dealing with large
> >>>>> infosets. But it still feels like we still run into usability issues.
> >>>>> For example, what if a user wants to see more? We need some
> >>>>> configuration options to increase what we've ellided. It's not big,
> but
> >>>>> every new thing that needs configuration adds complexity and
> decreases
> >>>>> usability.
> >>>>>
> >>>>> And I think the only reason we are trying to spend effort elliding
> >>>>> things is because we're limited to this gdb-like interface where you
> >> can
> >>>>> only print out a little information at a time.
> >>>>>
> >>>>> I think what would really is to dump this gdb interface and instead
> use
> >>>>> multiple windows/views. As a really close example to what I imagine,
> I
> >>>>> recently came across this hex editor:
> >>>>>
> >>>>> https://www.synalysis.net/
> >>>>>
> >>>>> The screenshots are a bit small so it's not super clear, but this
> tool
> >>>>> has one view for the data in hex, and one view for a tree of parsed
> >>>>> results (which is very similar to our infoset). The "infoset" view
> has
> >>>>> information like offset/length/value, and can be related back to the
> >>>>> data view to find the actual bits.
> >>>>>
> >>>>> I imagine the "next generation daffodil debugger" to look much like
> >>>>> this. As data is parsed, the infoset view fills up. This view could
> act
> >>>>> like a standard GUI tree so you could collapse sections or scroll
> >> around
> >>>>> to show just the parts you care about, and have search capabilities
> to
> >>>>> quickly jump around. The advantage here is you no longer really need
> >>>>> automated eliding or heuristics for what the user *might* care about.
> >>>>> You just show the whole thing and let user scroll around. As daffodil
> >>>>> parses and backtracks, this tree grows or shrinks.
> >>>>>
> >>>>> I also imagine you could have a cursor moving around the hex view, so
> >> as
> >>>>> daffodil moves around (e.g. scanning for delimiters, extracting
> >>>>> integers), one could update this data view to show what daffodil is
> >>>>> doing and where it is.
> >>>>>
> >>>>> I also image there could be other views as well. For example, a
> schema
> >>>>> view to show where in the schema daffodil is, and to add/remove
> >>>>> breakpoints. And an information view for things like variables,
> >> in-scope
> >>>>> delimiters, PoU's, etc.
> >>>>>
> >>>>> The only reason I mention a debug protcol is that would allow this
> GUI
> >>>>> to be more easily written in something other that Java/Scala to take
> >>>>> advantage of other GUI toolkits. It's been a long while since I've
> done
> >>>>> anything with Java guis, but they seems pretty poor that last I
> looked
> >>>>> at them. Would even allow for a TUI, which Java has little/no support
> >>>>> for. Also enables things like remote deubgging if an socket IPC was
> >>>>> used. Though I'm not sure all of that is necessary. Just thinking
> what
> >>>>> would be ideal, and it can always be pared back.
> >>>>>
> >>>>>
> >>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >>>>>> I don't think of it as a daffodil debug protocol, but just a
> >>>>> separation of concerns between display of information and the
> >> behaviors of
> >>>>> parse/unparse that need to be points where users can pause, and data
> >>>>> structures available to display.
> >>>>>>
> >>>>>> E.g., it is 100% a display issue that the infoset (shown as XML) is
> >>>>> clumsy, too big, etc.  The infoset is available in the processor
> >> state, and
> >>>>> one can examine the current node, enclosing node, prior sibling(s),
> >>>>> following sibling(s), etc. One can elide contents that are too big
> for
> >>>>> hexBinary, etc.
> >>>>>>
> >>>>>> I think this problem, how to display the infoset with sensible
> limits
> >>>>> on sizing, is fairly easy to come up with some design for, that will
> at
> >>>>> least be (1) always fairly small (2) much more useful in more cases.
> It
> >>>>> won't be perfect but can be much better than what we do now.
> >>>>>>
> >>>>>> One sensible display "mode" should be that displaying the context
> >>>>> surrounding the current element (when parsing or unparsing) displays
> at
> >>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> >> characters
> >>>>> (settable within reason ?)
> >>>>>>
> >>>>>> Sibling and enclosing nodes would be displayed eliding their
> contents
> >>>>> to at most 1 line.
> >>>>>>
> >>>>>> Here's an example of what I mean. Displaying up to M=10 lines total:
> >>>>>>
> >>>>>> ...
> >>>>>> <enclosingParent1>
> >>>>>>    ...
> >>>>>>    <priorSibling2>89ab782 ...</...>
> >>>>>>    <priorSibling1>some text is here and some more text</...>
> >>>>>>    <currentNode>value might be some big thing which needs to be
> >> elided
> >>>>> ...</...>
> >>>>>>    <followingSibling1> ... </...>
> >>>>>>    ???
> >>>>>> </enclosingParent1>
> >>>>>> ???
> >>>>>>
> >>>>>> The </...> is just an idea to reduce XML matching end-tag clutter.
> >>>>>>
> >>>>>> The ... on a line alone or where element content would appear
> >>>>> generally means 1 or more other siblings. The way the display above
> >> starts
> >>>>> with ... means that this is a relative inner nest, not starting from
> >> the
> >>>>> absolute root.
> >>>>>>
> >>>>>> The ... within simple content means that content is elided to fit on
> >>>>> one line. Always follows some text characters to differentiate from
> the
> >>>>> child-element context.
> >>>>>>
> >>>>>> The ??? means zero or more other siblings.
> >>>>>>
> >>>>>> I used bold italic above to point out that the current node would be
> >>>>> highlighted somehow. Probably a way to do this that doesn't require
> >> display
> >>>>> modes would be useful. E.g., a text marker like ">>>" as in:
> >>>>>>
> >>>>>>>>> <currentNode>value .... </...>
> >>>>>>
> >>>>>> might be better, particularly for a trace output being dumped to a
> >>>>> text file.
> >>>>>>
> >>>>>> I made the above example an unparser kind of example by showing a
> >>>>> following sibling that exists that is after the current node.
> >>>>>>
> >>>>>> I think the key concept is that any sibling node is displayed in a
> >> way
> >>>>> that fits on one line.
> >>>>>> E.g., even if the element name was really long, I'd suggest:
> >>>>>>
> >>>>>>   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >>>>>>
> >>>>>> Where the element name itself gets elided because it is too long.
> >>>>>>
> >>>>>> A thought. Note that the above presentation is shown as quasi-XML,
> >> but
> >>>>> there's nothing XML-specific about it. A JSON-friendly equivalent
> >> could be
> >>>>> done as well:
> >>>>>>
> >>>>>> enclosingParent1 = {
> >>>>>>    ...
> >>>>>>    priorSibling2 = "89ab782..."
> >>>>>>    priorSibling1 = "some text is here and some more text"
> >>>>>>    currentNode = "value might be some big thing which needs to be
> >>>>> elided ..."
> >>>>>>    followingSibling1 = { ... }
> >>>>>>    ???
> >>>>>> }
> >>>>>>
> >>>>>> That's enough for 1 email thread on this debug topic.
> >>>>>>
> >>>>>>
> >>>>>> ________________________________
> >>>>>> From: Steve Lawrence <slawre...@apache.org>
> >>>>>> Sent: Tuesday, January 5, 2021 2:26 PM
> >>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> >>>>>> Subject: The future of the daffodil DFDL schema debugger?
> >>>>>>
> >>>>>>
> >>>>>> Now that we're in a new year, I'd like to start a discussion about
> >> the
> >>>>>> Daffodil DFDL Schema debugger and how it might be improved to be
> more
> >>>>>> useful.
> >>>>>>
> >>>>>> Note that this is not the capabilities to debug Daffodil itself in
> >>>>>> something like Eclipse/IntelliJ, but the ability for Daffodil to
> >>>>> provide
> >>>>>> enough extra information during a parse/unparse so that a schema
> >>>>>> developer can get an idea of what Daffodil is doing. This makes it
> >>>>>> easier for users (rather than developers) to determine why a schema
> >>>>>> isn't giving the expect parse/unparse result (either because of bad
> >>>>> data
> >>>>>> or a faulty schema.
> >>>>>>
> >>>>>> The current state of the debugger is enabled by providing the
> --debug
> >>>>> or
> >>>>>> --trace flags in the CLI. More information about that here:
> >>>>>>
> >>>>>> https://daffodil.apache.org/debugger/
> >>>>>>
> >>>>>> This enables a TUI and commands somewhat similar to GDB, providing
> >>>>> thins
> >>>>>> like breakpoints, steps, displaying the current infoset, display a
> >> dump
> >>>>>> of the data, etc.
> >>>>>>
> >>>>>> Although I find this tool pretty useful, it definitely has some
> >> glaring
> >>>>>> issues.
> >>>>>>
> >>>>>> The most glaring to me is that it really isn't useful at all for
> >>>>>> debugging unparse. The data dumps only include then main
> >> outputstream,
> >>>>>> so determine things like suspensions and buffered output is
> >> impossible.
> >>>>>>
> >>>>>> Another issue is the infoset output. When outputting the infoset,
> the
> >>>>>> debugger currently just walks the entire thing and converts it to
> XML
> >>>>>> and displays the XML. For large infosets, this is excess and can
> make
> >>>>> it
> >>>>>> impossible to use, even with some configurations the limit how much
> >> of
> >>>>>> that infoset is actually printed to the screen. Also things like
> >> large
> >>>>>> hex binary blobs create excessive and unusable output.
> >>>>>>
> >>>>>> Another thing I feel is missing is a schema view. Right now it's
> very
> >>>>>> difficult to know where in the schema Daffodil actually is.
> >>>>>>
> >>>>>> I think these issues just need some thought improvement. One could
> >>>>>> imagine a better way to stringify our unparse buffers for debug. One
> >>>>>> could image a way to receive infoset state changes so the debugger
> >> can
> >>>>>> track things like backtracks and remove infosets. One could image a
> >> way
> >>>>>> display the schema
> >>>>>>
> >>>>>> We just need a better way to stringify the current state of the
> >> unparse
> >>>>>> data including buffers, and we need a way to for the debugger to
> >>>>> receive
> >>>>>> state change information about infoset so it can update displays
> >> rather
> >>>>>> than just constantly printing the entire infoset.
> >>>>>>
> >>>>>> However, I think another other big issue is just usability in
> >> general.
> >>>>> I
> >>>>>> think the CLI usage is reasonable, but it's not always user
> friendly,
> >>>>>> and is difficult to view multiple things at the same time. I think
> >>>>>> because of this very few people even use this tool. So this this
> like
> >>>>>> perhaps something worth focus.
> >>>>>>
> >>>>>> My first thought to improving this usability issue would be to
> >>>>> implement
> >>>>>> the Debug Adapter Protocol (DAP)
> >>>>>> (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >>>>>> which many IDE's implement. With this implemented, Daffodil could be
> >>>>>> plugged in to any IDE that supports it and essentially get debugging
> >>>>> for
> >>>>>> free, without the need to worry about the GUI elements.
> >>>>>>
> >>>>>> I do have concerns that this just wouldn't have enough functionality
> >>>>>> that we'd really need. For example, DAP really only has ability show
> >>>>>> code (Daffodil's equivalent is the DFDL schema). There isn't a way
> to
> >>>>>> show a live view of the infoset or data. Most DAP IDE's do have a
> >>>>>> console output, so we could potentially make it so the console
> output
> >>>>> is
> >>>>>> a live view of infoset/data. But I'm not even sure most DAP friendly
> >>>>>> IDE's could support this kindof console output. Does anyone have
> >>>>>> familiarity with DAP IDE's or and what kinds of console capabilities
> >>>>> are
> >>>>>> available?
> >>>>>>
> >>>>>> I also looked into TUI libraries with the idea that we could just
> >>>>> extend
> >>>>>> our current debugger user interface to be a bit friendlier.
> >>>>>> Unfortunately, there aren't too many Java/Scala TUI libraries and
> >> those
> >>>>>> that do exist don't have Apache friendly licenses. We also want to
> be
> >>>>>> careful about increase dependencies just for a debugger than many
> >>>>> people
> >>>>>> might not use, so large graphics libraries are probably out of the
> >>>>> question.
> >>>>>>
> >>>>>> This allo makes me wonder if an approach worth taking for the future
> >> of
> >>>>>> Daffodil schema debugging is developing a sort of "Daffodil Debug
> >>>>>> Protocol". I imagine it would be loosely based on DAP (which is
> >>>>>> essentially JSON message based) but could be targeted to the things
> >>>>> that
> >>>>>> a DFDL schema debugger would really need. An added benefit with some
> >>>>>> sort of protocol is the debugger interface can be uncoupled from
> >>>>>> Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >>>>>> language/GUI framework and just have it communicate the protocol
> over
> >>>>>> some form of IPC. Another benefit is that any future backends could
> >>>>>> implement this protocol and so a single debugger could hook into
> >>>>>> different backends without much issue. Unfortunately, defining such
> a
> >>>>>> protocol might be a large task, but we do have our existing debug
> >>>>>> infrastructure and things like DAP to guide its development/design.
> >>>>>>
> >>>>>> Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> >> we
> >>>>>> really just need the few improvements mentioned to the existing
> >>>>>> debugger. Is that enough to make it usable? Or is an entirely
> >> different
> >>>>>> approach needed to debugging schemas?
> >>>>>>
> >>>>>
> >>>>>
> >>
> >
>
>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to