Re: The future of the daffodil DFDL schema debugger?

Adam Rosien Mon, 19 Apr 2021 08:22:08 -0700

Hi everybody, I've recently started working on Daffodil with some other
folks and will be helping where I can with the debugger.


I've been writing Scala since ~2011 and recently wrote a book about Cats
Effect, which has a similar scope to ZIO (effects, concurrency, etc.). If
anybody has any questions about the approach and techniques, I'm happy to
help.

.. Adam

On Fri, Apr 16, 2021 at 2:49 PM Beckerle, Mike <
[email protected]> wrote:

> This is actually very cool using ZIO for this. I have to learn more about
> ZIO.
>
>
> ________________________________
> From: John Wass <[email protected]>
> Sent: Monday, April 12, 2021 9:58 AM
> To: [email protected] <[email protected]>
> Subject: Re: The future of the daffodil DFDL schema debugger?
>
> > the code is here https://github.com/jw3/example-daffodil-debug
>
> There is now a complete console based example for Zio that demonstrates
> controlling the debug flow while distributing the current state to three
> "displays".
> 1. infoset at current step
> 2. diff of infoset against previous step
> 3. bit position and value of data.
>
> These displays are very rudimentary but demonstrate the ability to
> asynchronously populate multiple views while synchronously controlling the
> debug loop.
>
> > - The new protocol being informed by existing debugger and DAPis key
>
> Going to look deeper into how DAP might fit with Daffodil, and depending on
> how it shakes out will update the example to show integration.
>
> Some interesting links to start with
> - https://github.com/scalacenter/scala-debug-adapter
> -
> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
> - https://github.com/microsoft/java-debug
>
> Also looking into the Java Debug Interface (JDI) for comparison.
>
>
> On Thu, Apr 8, 2021 at 12:36 PM John Wass <[email protected]> wrote:
>
> > Revisiting this post after doing some debugger related work and thinking
> > about debug protocol/adapters to connect external tooling to the debug
> > process.
> >
> > This comment is good
> >
> > > This allo makes me wonder if an approach worth taking for the future of
> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> Protocol".
> > I imagine it would be loosely based on DAP (which is  essentially JSON
> > message based) but could be targeted to the things that a DFDL schema
> > debugger would really need. An added benefit with some  sort of protocol
> > is the debugger interface can be uncoupled from Daffodil itself, so we
> > could implement a TUI/GUI/whatever in any  language/GUI framework and
> just
> > have it communicate the protocol over some form of IPC. Another benefit
> > is that any future backends could implement this protocol and so a single
> > debugger could hook into different backends without much issue.
> > Unfortunately, defining such a protocol might be a large task, but we do
> > have our existing debug infrastructure and things like DAP to guide its
> > development/design.
> >
> > Some thoughts on this
> > - Defining the protocol will be a large task, but a minimal version
> should
> > get up and round tripping quickly with a minimal subset of the protocol.
> > - The new protocol being informed by existing debugger and DAPis key
> > - Uncoupling from Daffodil is key
> > - Adapt the Daffodil protocol to produce DAP after the fact so as not to
> > constrain Daffodil debugging capability
> > - We dont need to tie the protocol or adapters to a single framework,
> > implementations of the IO layer should be simple enough to support
> multiple
> > things (eg Akka, Zio, "basic" ...)
> > - The current debugger lives in runtime1, but can we make an abstract API
> > that any runtime would implement?
> >
> > Maybe a solution is structured like this
> > - daffodil-debug-api:
> >   - protocol model
> >   - interfaces: debugger / IO adapter / etc
> >   - lives in daffodil repo (new subproject?)
> > - daffodil-debug-io-NAME
> >   - provides implementation of a specific IO adapter
> >   - multiple projects possible (daffodil-debugger-akka,
> > daffodil-debugger-zio, etc)
> >   - supported ones live in their own subprojects, but other can be
> plugged
> > in from external sources
> >   - ability to support multiple implementations reduces risk of lock-in
> > - debugger applications
> >   - maintained in external repositories
> >   - depending on the IO implementation these could execute be in separate
> > process or on separate machine
> >   - like Steve said, could be any language / framework
> >
> > Three types of reference implementations / sample applications could also
> > guide the development of the API
> >   1. a replacement for the existing TUI debugger, expected to end up with
> > at minimum the same functionality as the current one.
> >   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
> >   3. an IDE integration
> >
> > Thoughts?
> >
> > Also I'm working on some reference implementations of these concepts
> using
> > Akka and Zio.  Not quite ready to talk through it yet, but the code is
> here
> > https://github.com/jw3/example-daffodil-debug
> >
> >
> >
> > On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <[email protected]>
> > wrote:
> >
> >> Yep, something like that seems very reasonable for dealing with large
> >> infosets. But it still feels like we still run into usability issues.
> >> For example, what if a user wants to see more? We need some
> >> configuration options to increase what we've ellided. It's not big, but
> >> every new thing that needs configuration adds complexity and decreases
> >> usability.
> >>
> >> And I think the only reason we are trying to spend effort elliding
> >> things is because we're limited to this gdb-like interface where you can
> >> only print out a little information at a time.
> >>
> >> I think what would really is to dump this gdb interface and instead use
> >> multiple windows/views. As a really close example to what I imagine, I
> >> recently came across this hex editor:
> >>
> >> https://www.synalysis.net/
> >>
> >> The screenshots are a bit small so it's not super clear, but this tool
> >> has one view for the data in hex, and one view for a tree of parsed
> >> results (which is very similar to our infoset). The "infoset" view has
> >> information like offset/length/value, and can be related back to the
> >> data view to find the actual bits.
> >>
> >> I imagine the "next generation daffodil debugger" to look much like
> >> this. As data is parsed, the infoset view fills up. This view could act
> >> like a standard GUI tree so you could collapse sections or scroll around
> >> to show just the parts you care about, and have search capabilities to
> >> quickly jump around. The advantage here is you no longer really need
> >> automated eliding or heuristics for what the user *might* care about.
> >> You just show the whole thing and let user scroll around. As daffodil
> >> parses and backtracks, this tree grows or shrinks.
> >>
> >> I also imagine you could have a cursor moving around the hex view, so as
> >> daffodil moves around (e.g. scanning for delimiters, extracting
> >> integers), one could update this data view to show what daffodil is
> >> doing and where it is.
> >>
> >> I also image there could be other views as well. For example, a schema
> >> view to show where in the schema daffodil is, and to add/remove
> >> breakpoints. And an information view for things like variables, in-scope
> >> delimiters, PoU's, etc.
> >>
> >> The only reason I mention a debug protcol is that would allow this GUI
> >> to be more easily written in something other that Java/Scala to take
> >> advantage of other GUI toolkits. It's been a long while since I've done
> >> anything with Java guis, but they seems pretty poor that last I looked
> >> at them. Would even allow for a TUI, which Java has little/no support
> >> for. Also enables things like remote deubgging if an socket IPC was
> >> used. Though I'm not sure all of that is necessary. Just thinking what
> >> would be ideal, and it can always be pared back.
> >>
> >>
> >> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
> >> > I don't think of it as a daffodil debug protocol, but just a
> separation
> >> of concerns between display of information and the behaviors of
> >> parse/unparse that need to be points where users can pause, and data
> >> structures available to display.
> >> >
> >> > E.g., it is 100% a display issue that the infoset (shown as XML) is
> >> clumsy, too big, etc.  The infoset is available in the processor state,
> and
> >> one can examine the current node, enclosing node, prior sibling(s),
> >> following sibling(s), etc. One can elide contents that are too big for
> >> hexBinary, etc.
> >> >
> >> > I think this problem, how to display the infoset with sensible limits
> >> on sizing, is fairly easy to come up with some design for, that will at
> >> least be (1) always fairly small (2) much more useful in more cases. It
> >> won't be perfect but can be much better than what we do now.
> >> >
> >> > One sensible display "mode" should be that displaying the context
> >> surrounding the current element (when parsing or unparsing) displays at
> >> most N-lines. (N/2 before, N/2 after) with a maximum length of L
> characters
> >> (settable within reason ?)
> >> >
> >> > Sibling and enclosing nodes would be displayed eliding their contents
> >> to at most 1 line.
> >> >
> >> > Here's an example of what I mean. Displaying up to M=10 lines total:
> >> >
> >> > ...
> >> > <enclosingParent1>
> >> >    ...
> >> >    <priorSibling2>89ab782 ...</...>
> >> >    <priorSibling1>some text is here and some more text</...>
> >> >    <currentNode>value might be some big thing which needs to be elided
> >> ...</...>
> >> >    <followingSibling1> ... </...>
> >> >    ???
> >> > </enclosingParent1>
> >> > ???
> >> >
> >> > The </...> is just an idea to reduce XML matching end-tag clutter.
> >> >
> >> > The ... on a line alone or where element content would appear
> generally
> >> means 1 or more other siblings. The way the display above starts with
> ...
> >> means that this is a relative inner nest, not starting from the absolute
> >> root.
> >> >
> >> > The ... within simple content means that content is elided to fit on
> >> one line. Always follows some text characters to differentiate from the
> >> child-element context.
> >> >
> >> > The ??? means zero or more other siblings.
> >> >
> >> > I used bold italic above to point out that the current node would be
> >> highlighted somehow. Probably a way to do this that doesn't require
> display
> >> modes would be useful. E.g., a text marker like ">>>" as in:
> >> >
> >> >>>> <currentNode>value .... </...>
> >> >
> >> > might be better, particularly for a trace output being dumped to a
> text
> >> file.
> >> >
> >> > I made the above example an unparser kind of example by showing a
> >> following sibling that exists that is after the current node.
> >> >
> >> > I think the key concept is that any sibling node is displayed in a way
> >> that fits on one line.
> >> > E.g., even if the element name was really long, I'd suggest:
> >> >
> >> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
> >> >
> >> > Where the element name itself gets elided because it is too long.
> >> >
> >> > A thought. Note that the above presentation is shown as quasi-XML, but
> >> there's nothing XML-specific about it. A JSON-friendly equivalent could
> be
> >> done as well:
> >> >
> >> > enclosingParent1 = {
> >> >    ...
> >> >    priorSibling2 = "89ab782..."
> >> >    priorSibling1 = "some text is here and some more text"
> >> >    currentNode = "value might be some big thing which needs to be
> >> elided ..."
> >> >    followingSibling1 = { ... }
> >> >    ???
> >> > }
> >> >
> >> > That's enough for 1 email thread on this debug topic.
> >> >
> >> >
> >> > ________________________________
> >> > From: Steve Lawrence <[email protected]>
> >> > Sent: Tuesday, January 5, 2021 2:26 PM
> >> > To: [email protected] <[email protected]>
> >> > Subject: The future of the daffodil DFDL schema debugger?
> >> >
> >> >
> >> > Now that we're in a new year, I'd like to start a discussion about the
> >> > Daffodil DFDL Schema debugger and how it might be improved to be more
> >> > useful.
> >> >
> >> > Note that this is not the capabilities to debug Daffodil itself in
> >> > something like Eclipse/IntelliJ, but the ability for Daffodil to
> provide
> >> > enough extra information during a parse/unparse so that a schema
> >> > developer can get an idea of what Daffodil is doing. This makes it
> >> > easier for users (rather than developers) to determine why a schema
> >> > isn't giving the expect parse/unparse result (either because of bad
> data
> >> > or a faulty schema.
> >> >
> >> > The current state of the debugger is enabled by providing the --debug
> or
> >> > --trace flags in the CLI. More information about that here:
> >> >
> >> > https://daffodil.apache.org/debugger/
> >> >
> >> > This enables a TUI and commands somewhat similar to GDB, providing
> thins
> >> > like breakpoints, steps, displaying the current infoset, display a
> dump
> >> > of the data, etc.
> >> >
> >> > Although I find this tool pretty useful, it definitely has some
> glaring
> >> > issues.
> >> >
> >> > The most glaring to me is that it really isn't useful at all for
> >> > debugging unparse. The data dumps only include then main outputstream,
> >> > so determine things like suspensions and buffered output is
> impossible.
> >> >
> >> > Another issue is the infoset output. When outputting the infoset, the
> >> > debugger currently just walks the entire thing and converts it to XML
> >> > and displays the XML. For large infosets, this is excess and can make
> it
> >> > impossible to use, even with some configurations the limit how much of
> >> > that infoset is actually printed to the screen. Also things like large
> >> > hex binary blobs create excessive and unusable output.
> >> >
> >> > Another thing I feel is missing is a schema view. Right now it's very
> >> > difficult to know where in the schema Daffodil actually is.
> >> >
> >> > I think these issues just need some thought improvement. One could
> >> > imagine a better way to stringify our unparse buffers for debug. One
> >> > could image a way to receive infoset state changes so the debugger can
> >> > track things like backtracks and remove infosets. One could image a
> way
> >> > display the schema
> >> >
> >> > We just need a better way to stringify the current state of the
> unparse
> >> > data including buffers, and we need a way to for the debugger to
> receive
> >> > state change information about infoset so it can update displays
> rather
> >> > than just constantly printing the entire infoset.
> >> >
> >> > However, I think another other big issue is just usability in
> general. I
> >> > think the CLI usage is reasonable, but it's not always user friendly,
> >> > and is difficult to view multiple things at the same time. I think
> >> > because of this very few people even use this tool. So this this like
> >> > perhaps something worth focus.
> >> >
> >> > My first thought to improving this usability issue would be to
> implement
> >> > the Debug Adapter Protocol (DAP)
> >> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
> >> > which many IDE's implement. With this implemented, Daffodil could be
> >> > plugged in to any IDE that supports it and essentially get debugging
> for
> >> > free, without the need to worry about the GUI elements.
> >> >
> >> > I do have concerns that this just wouldn't have enough functionality
> >> > that we'd really need. For example, DAP really only has ability show
> >> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
> >> > show a live view of the infoset or data. Most DAP IDE's do have a
> >> > console output, so we could potentially make it so the console output
> is
> >> > a live view of infoset/data. But I'm not even sure most DAP friendly
> >> > IDE's could support this kindof console output. Does anyone have
> >> > familiarity with DAP IDE's or and what kinds of console capabilities
> are
> >> > available?
> >> >
> >> > I also looked into TUI libraries with the idea that we could just
> extend
> >> > our current debugger user interface to be a bit friendlier.
> >> > Unfortunately, there aren't too many Java/Scala TUI libraries and
> those
> >> > that do exist don't have Apache friendly licenses. We also want to be
> >> > careful about increase dependencies just for a debugger than many
> people
> >> > might not use, so large graphics libraries are probably out of the
> >> question.
> >> >
> >> > This allo makes me wonder if an approach worth taking for the future
> of
> >> > Daffodil schema debugging is developing a sort of "Daffodil Debug
> >> > Protocol". I imagine it would be loosely based on DAP (which is
> >> > essentially JSON message based) but could be targeted to the things
> that
> >> > a DFDL schema debugger would really need. An added benefit with some
> >> > sort of protocol is the debugger interface can be uncoupled from
> >> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
> >> > language/GUI framework and just have it communicate the protocol over
> >> > some form of IPC. Another benefit is that any future backends could
> >> > implement this protocol and so a single debugger could hook into
> >> > different backends without much issue. Unfortunately, defining such a
> >> > protocol might be a large task, but we do have our existing debug
> >> > infrastructure and things like DAP to guide its development/design.
> >> >
> >> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
> we
> >> > really just need the few improvements mentioned to the existing
> >> > debugger. Is that enough to make it usable? Or is an entirely
> different
> >> > approach needed to debugging schemas?
> >> >
> >>
> >>
>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to