Re: The future of the daffodil DFDL schema debugger?

John Wass Tue, 20 Apr 2021 07:31:05 -0700

> Next step is to refine these thoughts with a prototype.

Another next step is to collect feedback on this research and proposed
approach.  Any discussion is appreciated.




On Tue, Apr 20, 2021 at 10:00 AM John Wass <[email protected]> wrote:

> > Going to look deeper into how DAP might fit with Daffodil
>
> Have been looking over DAP and getting a good feeling about it. The
> specification [1] seems general enough that it could be applied to Daffodil
> and cover a swath of common operations (like start, stop, break, continue,
> code locations, variables, etc).
>
> There are many areas though that are unique to Daffodil that have no
> representation in the spec.  These things (like InputStream, Infoset, PoU,
> different variable types, backtracking, etc) will need an extension to
> DAP.  This really boils down to defining these things to fit under the DAP
> BaseProtocol and enabling handling of those objects on both the front and
> back ends.
>
> On the backend we need a Daffodil DAP protocol server.  Existing JVM
> implementations (like Java [2], Scala [3]) are tied closely to JDI and
> would bring a lot of extra baggage to work around that.  Developing a
> Daffodil specific implementation is no small task, but feasible.  There are
> a several existing implementations on the JVM that are close and can be
> looked at for reference.
>
> The backend implementation would look similar to what was described in an
> earlier post.  We could use ZIO/Akka/etc to implement the backend Protocol
> Server to enable the IO between the Daffodil process and the DAP clients.
> This implementation would now be guided by the DAP specification.
>
> With the protocol and backend extended to fit Daffodil that leaves the
> frontend.  In theory an existing IDE plugin should get pretty close to
> being able to perform the common debug operations mentioned above.  To
> support the Daffodil extensions there will need to be handling of the
> extended protocol into whatever views are desired/applicable.
>
> > Also looking into the Java Debug Interface (JDI) for comparison.
>
> JDI appears to be the wrong level of abstraction for what we are talking
> about in debugging Daffodil for schema development.  While DAP does do JVM
> debugging (through a JDI DAP impl) it also generalizes to many other
> debugging scenarios.  JDI on the other hand is very tied to the JVM.
>
> Extending the JDI appears to be more complex than dealing with DAP, and
> even though the JDI API is mostly defined with interfaces, there are choke
> points that limit to JVM concepts.  For example jdi.Value has a finite set
> of JVM types that it works with, its not clear where Daffodil types would
> plugin if even possible.
>
> The final note is that unique Daffodil features wouldn’t get to IDE
> support any faster JDI.  In some cases, like VS Code, you would still need
> an extended DAP to support these features.
>
> > and depending on how it shakes out will update the example to show
> integration
>
> It would appear wise to investigate DAP further.  Next step is to refine
> these thoughts with a prototype. I started an implementation in the example
> debugger project [4] to try to run the current example on a _minimal_ DAP
> implementation.
>
>
> [1] https://microsoft.github.io/debug-adapter-protocol/specification
> [2] https://github.com/Microsoft/java-debug
> [3] https://github.com/scalacenter/scala-debug-adapter
> [4] https://github.com/jw3/example-daffodil-debug
>
>
> On Mon, Apr 12, 2021 at 9:58 AM John Wass <[email protected]> wrote:
>
>> > the code is here https://github.com/jw3/example-daffodil-debug
>>
>> There is now a complete console based example for Zio that demonstrates
>> controlling the debug flow while distributing the current state to three
>> "displays".
>> 1. infoset at current step
>> 2. diff of infoset against previous step
>> 3. bit position and value of data.
>>
>> These displays are very rudimentary but demonstrate the ability to
>> asynchronously populate multiple views while synchronously controlling the
>> debug loop.
>>
>> > - The new protocol being informed by existing debugger and DAPis key
>>
>> Going to look deeper into how DAP might fit with Daffodil, and depending
>> on how it shakes out will update the example to show integration.
>>
>> Some interesting links to start with
>> - https://github.com/scalacenter/scala-debug-adapter
>> -
>> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html
>> - https://github.com/microsoft/java-debug
>>
>> Also looking into the Java Debug Interface (JDI) for comparison.
>>
>>
>> On Thu, Apr 8, 2021 at 12:36 PM John Wass <[email protected]> wrote:
>>
>>> Revisiting this post after doing some debugger related work and thinking
>>> about debug protocol/adapters to connect external tooling to the debug
>>> process.
>>>
>>> This comment is good
>>>
>>> > This allo makes me wonder if an approach worth taking for the future
>>> of Daffodil schema debugging is developing a sort of "Daffodil Debug
>>> Protocol". I imagine it would be loosely based on DAP (which is
>>> essentially JSON message based) but could be targeted to the things that a
>>> DFDL schema debugger would really need. An added benefit with some  sort of
>>> protocol is the debugger interface can be uncoupled from Daffodil
>>> itself, so we could implement a TUI/GUI/whatever in any  language/GUI
>>> framework and just have it communicate the protocol over some form of
>>> IPC. Another benefit is that any future backends could implement this
>>> protocol and so a single debugger could hook into different backends
>>> without much issue. Unfortunately, defining such a protocol might be a
>>> large task, but we do have our existing debug infrastructure and things
>>> like DAP to guide its development/design.
>>>
>>> Some thoughts on this
>>> - Defining the protocol will be a large task, but a minimal version
>>> should get up and round tripping quickly with a minimal subset of the
>>> protocol.
>>> - The new protocol being informed by existing debugger and DAPis key
>>> - Uncoupling from Daffodil is key
>>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to
>>> constrain Daffodil debugging capability
>>> - We dont need to tie the protocol or adapters to a single framework,
>>> implementations of the IO layer should be simple enough to support multiple
>>> things (eg Akka, Zio, "basic" ...)
>>> - The current debugger lives in runtime1, but can we make an abstract
>>> API that any runtime would implement?
>>>
>>> Maybe a solution is structured like this
>>> - daffodil-debug-api:
>>>   - protocol model
>>>   - interfaces: debugger / IO adapter / etc
>>>   - lives in daffodil repo (new subproject?)
>>> - daffodil-debug-io-NAME
>>>   - provides implementation of a specific IO adapter
>>>   - multiple projects possible (daffodil-debugger-akka,
>>> daffodil-debugger-zio, etc)
>>>   - supported ones live in their own subprojects, but other can be
>>> plugged in from external sources
>>>   - ability to support multiple implementations reduces risk of lock-in
>>> - debugger applications
>>>   - maintained in external repositories
>>>   - depending on the IO implementation these could execute be in
>>> separate process or on separate machine
>>>   - like Steve said, could be any language / framework
>>>
>>> Three types of reference implementations / sample applications could
>>> also guide the development of the API
>>>   1. a replacement for the existing TUI debugger, expected to end up
>>> with at minimum the same functionality as the current one.
>>>   2. a standalone GUI (JavaFX, Scala.js, ..) debugger
>>>   3. an IDE integration
>>>
>>> Thoughts?
>>>
>>> Also I'm working on some reference implementations of these concepts
>>> using Akka and Zio.  Not quite ready to talk through it yet, but the code
>>> is here https://github.com/jw3/example-daffodil-debug
>>>
>>>
>>>
>>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <[email protected]>
>>> wrote:
>>>
>>>> Yep, something like that seems very reasonable for dealing with large
>>>> infosets. But it still feels like we still run into usability issues.
>>>> For example, what if a user wants to see more? We need some
>>>> configuration options to increase what we've ellided. It's not big, but
>>>> every new thing that needs configuration adds complexity and decreases
>>>> usability.
>>>>
>>>> And I think the only reason we are trying to spend effort elliding
>>>> things is because we're limited to this gdb-like interface where you can
>>>> only print out a little information at a time.
>>>>
>>>> I think what would really is to dump this gdb interface and instead use
>>>> multiple windows/views. As a really close example to what I imagine, I
>>>> recently came across this hex editor:
>>>>
>>>> https://www.synalysis.net/
>>>>
>>>> The screenshots are a bit small so it's not super clear, but this tool
>>>> has one view for the data in hex, and one view for a tree of parsed
>>>> results (which is very similar to our infoset). The "infoset" view has
>>>> information like offset/length/value, and can be related back to the
>>>> data view to find the actual bits.
>>>>
>>>> I imagine the "next generation daffodil debugger" to look much like
>>>> this. As data is parsed, the infoset view fills up. This view could act
>>>> like a standard GUI tree so you could collapse sections or scroll around
>>>> to show just the parts you care about, and have search capabilities to
>>>> quickly jump around. The advantage here is you no longer really need
>>>> automated eliding or heuristics for what the user *might* care about.
>>>> You just show the whole thing and let user scroll around. As daffodil
>>>> parses and backtracks, this tree grows or shrinks.
>>>>
>>>> I also imagine you could have a cursor moving around the hex view, so as
>>>> daffodil moves around (e.g. scanning for delimiters, extracting
>>>> integers), one could update this data view to show what daffodil is
>>>> doing and where it is.
>>>>
>>>> I also image there could be other views as well. For example, a schema
>>>> view to show where in the schema daffodil is, and to add/remove
>>>> breakpoints. And an information view for things like variables, in-scope
>>>> delimiters, PoU's, etc.
>>>>
>>>> The only reason I mention a debug protcol is that would allow this GUI
>>>> to be more easily written in something other that Java/Scala to take
>>>> advantage of other GUI toolkits. It's been a long while since I've done
>>>> anything with Java guis, but they seems pretty poor that last I looked
>>>> at them. Would even allow for a TUI, which Java has little/no support
>>>> for. Also enables things like remote deubgging if an socket IPC was
>>>> used. Though I'm not sure all of that is necessary. Just thinking what
>>>> would be ideal, and it can always be pared back.
>>>>
>>>>
>>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote:
>>>> > I don't think of it as a daffodil debug protocol, but just a
>>>> separation of concerns between display of information and the behaviors of
>>>> parse/unparse that need to be points where users can pause, and data
>>>> structures available to display.
>>>> >
>>>> > E.g., it is 100% a display issue that the infoset (shown as XML) is
>>>> clumsy, too big, etc.  The infoset is available in the processor state, and
>>>> one can examine the current node, enclosing node, prior sibling(s),
>>>> following sibling(s), etc. One can elide contents that are too big for
>>>> hexBinary, etc.
>>>> >
>>>> > I think this problem, how to display the infoset with sensible limits
>>>> on sizing, is fairly easy to come up with some design for, that will at
>>>> least be (1) always fairly small (2) much more useful in more cases. It
>>>> won't be perfect but can be much better than what we do now.
>>>> >
>>>> > One sensible display "mode" should be that displaying the context
>>>> surrounding the current element (when parsing or unparsing) displays at
>>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters
>>>> (settable within reason ?)
>>>> >
>>>> > Sibling and enclosing nodes would be displayed eliding their contents
>>>> to at most 1 line.
>>>> >
>>>> > Here's an example of what I mean. Displaying up to M=10 lines total:
>>>> >
>>>> > ...
>>>> > <enclosingParent1>
>>>> >    ...
>>>> >    <priorSibling2>89ab782 ...</...>
>>>> >    <priorSibling1>some text is here and some more text</...>
>>>> >    <currentNode>value might be some big thing which needs to be
>>>> elided ...</...>
>>>> >    <followingSibling1> ... </...>
>>>> >    ???
>>>> > </enclosingParent1>
>>>> > ???
>>>> >
>>>> > The </...> is just an idea to reduce XML matching end-tag clutter.
>>>> >
>>>> > The ... on a line alone or where element content would appear
>>>> generally means 1 or more other siblings. The way the display above starts
>>>> with ... means that this is a relative inner nest, not starting from the
>>>> absolute root.
>>>> >
>>>> > The ... within simple content means that content is elided to fit on
>>>> one line. Always follows some text characters to differentiate from the
>>>> child-element context.
>>>> >
>>>> > The ??? means zero or more other siblings.
>>>> >
>>>> > I used bold italic above to point out that the current node would be
>>>> highlighted somehow. Probably a way to do this that doesn't require display
>>>> modes would be useful. E.g., a text marker like ">>>" as in:
>>>> >
>>>> >>>> <currentNode>value .... </...>
>>>> >
>>>> > might be better, particularly for a trace output being dumped to a
>>>> text file.
>>>> >
>>>> > I made the above example an unparser kind of example by showing a
>>>> following sibling that exists that is after the current node.
>>>> >
>>>> > I think the key concept is that any sibling node is displayed in a
>>>> way that fits on one line.
>>>> > E.g., even if the element name was really long, I'd suggest:
>>>> >
>>>> >   <hereIsAnElementWithASuperLongName...>abcd ... </...>
>>>> >
>>>> > Where the element name itself gets elided because it is too long.
>>>> >
>>>> > A thought. Note that the above presentation is shown as quasi-XML,
>>>> but there's nothing XML-specific about it. A JSON-friendly equivalent could
>>>> be done as well:
>>>> >
>>>> > enclosingParent1 = {
>>>> >    ...
>>>> >    priorSibling2 = "89ab782..."
>>>> >    priorSibling1 = "some text is here and some more text"
>>>> >    currentNode = "value might be some big thing which needs to be
>>>> elided ..."
>>>> >    followingSibling1 = { ... }
>>>> >    ???
>>>> > }
>>>> >
>>>> > That's enough for 1 email thread on this debug topic.
>>>> >
>>>> >
>>>> > ________________________________
>>>> > From: Steve Lawrence <[email protected]>
>>>> > Sent: Tuesday, January 5, 2021 2:26 PM
>>>> > To: [email protected] <[email protected]>
>>>> > Subject: The future of the daffodil DFDL schema debugger?
>>>> >
>>>> >
>>>> > Now that we're in a new year, I'd like to start a discussion about the
>>>> > Daffodil DFDL Schema debugger and how it might be improved to be more
>>>> > useful.
>>>> >
>>>> > Note that this is not the capabilities to debug Daffodil itself in
>>>> > something like Eclipse/IntelliJ, but the ability for Daffodil to
>>>> provide
>>>> > enough extra information during a parse/unparse so that a schema
>>>> > developer can get an idea of what Daffodil is doing. This makes it
>>>> > easier for users (rather than developers) to determine why a schema
>>>> > isn't giving the expect parse/unparse result (either because of bad
>>>> data
>>>> > or a faulty schema.
>>>> >
>>>> > The current state of the debugger is enabled by providing the --debug
>>>> or
>>>> > --trace flags in the CLI. More information about that here:
>>>> >
>>>> > https://daffodil.apache.org/debugger/
>>>> >
>>>> > This enables a TUI and commands somewhat similar to GDB, providing
>>>> thins
>>>> > like breakpoints, steps, displaying the current infoset, display a
>>>> dump
>>>> > of the data, etc.
>>>> >
>>>> > Although I find this tool pretty useful, it definitely has some
>>>> glaring
>>>> > issues.
>>>> >
>>>> > The most glaring to me is that it really isn't useful at all for
>>>> > debugging unparse. The data dumps only include then main outputstream,
>>>> > so determine things like suspensions and buffered output is
>>>> impossible.
>>>> >
>>>> > Another issue is the infoset output. When outputting the infoset, the
>>>> > debugger currently just walks the entire thing and converts it to XML
>>>> > and displays the XML. For large infosets, this is excess and can make
>>>> it
>>>> > impossible to use, even with some configurations the limit how much of
>>>> > that infoset is actually printed to the screen. Also things like large
>>>> > hex binary blobs create excessive and unusable output.
>>>> >
>>>> > Another thing I feel is missing is a schema view. Right now it's very
>>>> > difficult to know where in the schema Daffodil actually is.
>>>> >
>>>> > I think these issues just need some thought improvement. One could
>>>> > imagine a better way to stringify our unparse buffers for debug. One
>>>> > could image a way to receive infoset state changes so the debugger can
>>>> > track things like backtracks and remove infosets. One could image a
>>>> way
>>>> > display the schema
>>>> >
>>>> > We just need a better way to stringify the current state of the
>>>> unparse
>>>> > data including buffers, and we need a way to for the debugger to
>>>> receive
>>>> > state change information about infoset so it can update displays
>>>> rather
>>>> > than just constantly printing the entire infoset.
>>>> >
>>>> > However, I think another other big issue is just usability in
>>>> general. I
>>>> > think the CLI usage is reasonable, but it's not always user friendly,
>>>> > and is difficult to view multiple things at the same time. I think
>>>> > because of this very few people even use this tool. So this this like
>>>> > perhaps something worth focus.
>>>> >
>>>> > My first thought to improving this usability issue would be to
>>>> implement
>>>> > the Debug Adapter Protocol (DAP)
>>>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil,
>>>> > which many IDE's implement. With this implemented, Daffodil could be
>>>> > plugged in to any IDE that supports it and essentially get debugging
>>>> for
>>>> > free, without the need to worry about the GUI elements.
>>>> >
>>>> > I do have concerns that this just wouldn't have enough functionality
>>>> > that we'd really need. For example, DAP really only has ability show
>>>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to
>>>> > show a live view of the infoset or data. Most DAP IDE's do have a
>>>> > console output, so we could potentially make it so the console output
>>>> is
>>>> > a live view of infoset/data. But I'm not even sure most DAP friendly
>>>> > IDE's could support this kindof console output. Does anyone have
>>>> > familiarity with DAP IDE's or and what kinds of console capabilities
>>>> are
>>>> > available?
>>>> >
>>>> > I also looked into TUI libraries with the idea that we could just
>>>> extend
>>>> > our current debugger user interface to be a bit friendlier.
>>>> > Unfortunately, there aren't too many Java/Scala TUI libraries and
>>>> those
>>>> > that do exist don't have Apache friendly licenses. We also want to be
>>>> > careful about increase dependencies just for a debugger than many
>>>> people
>>>> > might not use, so large graphics libraries are probably out of the
>>>> question.
>>>> >
>>>> > This allo makes me wonder if an approach worth taking for the future
>>>> of
>>>> > Daffodil schema debugging is developing a sort of "Daffodil Debug
>>>> > Protocol". I imagine it would be loosely based on DAP (which is
>>>> > essentially JSON message based) but could be targeted to the things
>>>> that
>>>> > a DFDL schema debugger would really need. An added benefit with some
>>>> > sort of protocol is the debugger interface can be uncoupled from
>>>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any
>>>> > language/GUI framework and just have it communicate the protocol over
>>>> > some form of IPC. Another benefit is that any future backends could
>>>> > implement this protocol and so a single debugger could hook into
>>>> > different backends without much issue. Unfortunately, defining such a
>>>> > protocol might be a large task, but we do have our existing debug
>>>> > infrastructure and things like DAP to guide its development/design.
>>>> >
>>>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps
>>>> we
>>>> > really just need the few improvements mentioned to the existing
>>>> > debugger. Is that enough to make it usable? Or is an entirely
>>>> different
>>>> > approach needed to debugging schemas?
>>>> >
>>>>
>>>>

Re: The future of the daffodil DFDL schema debugger?

Reply via email to