Hi everybody, I've recently started working on Daffodil with some other folks and will be helping where I can with the debugger.
I've been writing Scala since ~2011 and recently wrote a book about Cats Effect, which has a similar scope to ZIO (effects, concurrency, etc.). If anybody has any questions about the approach and techniques, I'm happy to help. .. Adam On Fri, Apr 16, 2021 at 2:49 PM Beckerle, Mike < mbecke...@owlcyberdefense.com> wrote: > This is actually very cool using ZIO for this. I have to learn more about > ZIO. > > > ________________________________ > From: John Wass <jwa...@gmail.com> > Sent: Monday, April 12, 2021 9:58 AM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: The future of the daffodil DFDL schema debugger? > > > the code is here https://github.com/jw3/example-daffodil-debug > > There is now a complete console based example for Zio that demonstrates > controlling the debug flow while distributing the current state to three > "displays". > 1. infoset at current step > 2. diff of infoset against previous step > 3. bit position and value of data. > > These displays are very rudimentary but demonstrate the ability to > asynchronously populate multiple views while synchronously controlling the > debug loop. > > > - The new protocol being informed by existing debugger and DAPis key > > Going to look deeper into how DAP might fit with Daffodil, and depending on > how it shakes out will update the example to show integration. > > Some interesting links to start with > - https://github.com/scalacenter/scala-debug-adapter > - > https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html > - https://github.com/microsoft/java-debug > > Also looking into the Java Debug Interface (JDI) for comparison. > > > On Thu, Apr 8, 2021 at 12:36 PM John Wass <jwa...@gmail.com> wrote: > > > Revisiting this post after doing some debugger related work and thinking > > about debug protocol/adapters to connect external tooling to the debug > > process. > > > > This comment is good > > > > > This allo makes me wonder if an approach worth taking for the future of > > Daffodil schema debugging is developing a sort of "Daffodil Debug > Protocol". > > I imagine it would be loosely based on DAP (which is essentially JSON > > message based) but could be targeted to the things that a DFDL schema > > debugger would really need. An added benefit with some sort of protocol > > is the debugger interface can be uncoupled from Daffodil itself, so we > > could implement a TUI/GUI/whatever in any language/GUI framework and > just > > have it communicate the protocol over some form of IPC. Another benefit > > is that any future backends could implement this protocol and so a single > > debugger could hook into different backends without much issue. > > Unfortunately, defining such a protocol might be a large task, but we do > > have our existing debug infrastructure and things like DAP to guide its > > development/design. > > > > Some thoughts on this > > - Defining the protocol will be a large task, but a minimal version > should > > get up and round tripping quickly with a minimal subset of the protocol. > > - The new protocol being informed by existing debugger and DAPis key > > - Uncoupling from Daffodil is key > > - Adapt the Daffodil protocol to produce DAP after the fact so as not to > > constrain Daffodil debugging capability > > - We dont need to tie the protocol or adapters to a single framework, > > implementations of the IO layer should be simple enough to support > multiple > > things (eg Akka, Zio, "basic" ...) > > - The current debugger lives in runtime1, but can we make an abstract API > > that any runtime would implement? > > > > Maybe a solution is structured like this > > - daffodil-debug-api: > > - protocol model > > - interfaces: debugger / IO adapter / etc > > - lives in daffodil repo (new subproject?) > > - daffodil-debug-io-NAME > > - provides implementation of a specific IO adapter > > - multiple projects possible (daffodil-debugger-akka, > > daffodil-debugger-zio, etc) > > - supported ones live in their own subprojects, but other can be > plugged > > in from external sources > > - ability to support multiple implementations reduces risk of lock-in > > - debugger applications > > - maintained in external repositories > > - depending on the IO implementation these could execute be in separate > > process or on separate machine > > - like Steve said, could be any language / framework > > > > Three types of reference implementations / sample applications could also > > guide the development of the API > > 1. a replacement for the existing TUI debugger, expected to end up with > > at minimum the same functionality as the current one. > > 2. a standalone GUI (JavaFX, Scala.js, ..) debugger > > 3. an IDE integration > > > > Thoughts? > > > > Also I'm working on some reference implementations of these concepts > using > > Akka and Zio. Not quite ready to talk through it yet, but the code is > here > > https://github.com/jw3/example-daffodil-debug > > > > > > > > On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <slawre...@apache.org> > > wrote: > > > >> Yep, something like that seems very reasonable for dealing with large > >> infosets. But it still feels like we still run into usability issues. > >> For example, what if a user wants to see more? We need some > >> configuration options to increase what we've ellided. It's not big, but > >> every new thing that needs configuration adds complexity and decreases > >> usability. > >> > >> And I think the only reason we are trying to spend effort elliding > >> things is because we're limited to this gdb-like interface where you can > >> only print out a little information at a time. > >> > >> I think what would really is to dump this gdb interface and instead use > >> multiple windows/views. As a really close example to what I imagine, I > >> recently came across this hex editor: > >> > >> https://www.synalysis.net/ > >> > >> The screenshots are a bit small so it's not super clear, but this tool > >> has one view for the data in hex, and one view for a tree of parsed > >> results (which is very similar to our infoset). The "infoset" view has > >> information like offset/length/value, and can be related back to the > >> data view to find the actual bits. > >> > >> I imagine the "next generation daffodil debugger" to look much like > >> this. As data is parsed, the infoset view fills up. This view could act > >> like a standard GUI tree so you could collapse sections or scroll around > >> to show just the parts you care about, and have search capabilities to > >> quickly jump around. The advantage here is you no longer really need > >> automated eliding or heuristics for what the user *might* care about. > >> You just show the whole thing and let user scroll around. As daffodil > >> parses and backtracks, this tree grows or shrinks. > >> > >> I also imagine you could have a cursor moving around the hex view, so as > >> daffodil moves around (e.g. scanning for delimiters, extracting > >> integers), one could update this data view to show what daffodil is > >> doing and where it is. > >> > >> I also image there could be other views as well. For example, a schema > >> view to show where in the schema daffodil is, and to add/remove > >> breakpoints. And an information view for things like variables, in-scope > >> delimiters, PoU's, etc. > >> > >> The only reason I mention a debug protcol is that would allow this GUI > >> to be more easily written in something other that Java/Scala to take > >> advantage of other GUI toolkits. It's been a long while since I've done > >> anything with Java guis, but they seems pretty poor that last I looked > >> at them. Would even allow for a TUI, which Java has little/no support > >> for. Also enables things like remote deubgging if an socket IPC was > >> used. Though I'm not sure all of that is necessary. Just thinking what > >> would be ideal, and it can always be pared back. > >> > >> > >> On 1/6/21 12:44 PM, Beckerle, Mike wrote: > >> > I don't think of it as a daffodil debug protocol, but just a > separation > >> of concerns between display of information and the behaviors of > >> parse/unparse that need to be points where users can pause, and data > >> structures available to display. > >> > > >> > E.g., it is 100% a display issue that the infoset (shown as XML) is > >> clumsy, too big, etc. The infoset is available in the processor state, > and > >> one can examine the current node, enclosing node, prior sibling(s), > >> following sibling(s), etc. One can elide contents that are too big for > >> hexBinary, etc. > >> > > >> > I think this problem, how to display the infoset with sensible limits > >> on sizing, is fairly easy to come up with some design for, that will at > >> least be (1) always fairly small (2) much more useful in more cases. It > >> won't be perfect but can be much better than what we do now. > >> > > >> > One sensible display "mode" should be that displaying the context > >> surrounding the current element (when parsing or unparsing) displays at > >> most N-lines. (N/2 before, N/2 after) with a maximum length of L > characters > >> (settable within reason ?) > >> > > >> > Sibling and enclosing nodes would be displayed eliding their contents > >> to at most 1 line. > >> > > >> > Here's an example of what I mean. Displaying up to M=10 lines total: > >> > > >> > ... > >> > <enclosingParent1> > >> > ... > >> > <priorSibling2>89ab782 ...</...> > >> > <priorSibling1>some text is here and some more text</...> > >> > <currentNode>value might be some big thing which needs to be elided > >> ...</...> > >> > <followingSibling1> ... </...> > >> > ??? > >> > </enclosingParent1> > >> > ??? > >> > > >> > The </...> is just an idea to reduce XML matching end-tag clutter. > >> > > >> > The ... on a line alone or where element content would appear > generally > >> means 1 or more other siblings. The way the display above starts with > ... > >> means that this is a relative inner nest, not starting from the absolute > >> root. > >> > > >> > The ... within simple content means that content is elided to fit on > >> one line. Always follows some text characters to differentiate from the > >> child-element context. > >> > > >> > The ??? means zero or more other siblings. > >> > > >> > I used bold italic above to point out that the current node would be > >> highlighted somehow. Probably a way to do this that doesn't require > display > >> modes would be useful. E.g., a text marker like ">>>" as in: > >> > > >> >>>> <currentNode>value .... </...> > >> > > >> > might be better, particularly for a trace output being dumped to a > text > >> file. > >> > > >> > I made the above example an unparser kind of example by showing a > >> following sibling that exists that is after the current node. > >> > > >> > I think the key concept is that any sibling node is displayed in a way > >> that fits on one line. > >> > E.g., even if the element name was really long, I'd suggest: > >> > > >> > <hereIsAnElementWithASuperLongName...>abcd ... </...> > >> > > >> > Where the element name itself gets elided because it is too long. > >> > > >> > A thought. Note that the above presentation is shown as quasi-XML, but > >> there's nothing XML-specific about it. A JSON-friendly equivalent could > be > >> done as well: > >> > > >> > enclosingParent1 = { > >> > ... > >> > priorSibling2 = "89ab782..." > >> > priorSibling1 = "some text is here and some more text" > >> > currentNode = "value might be some big thing which needs to be > >> elided ..." > >> > followingSibling1 = { ... } > >> > ??? > >> > } > >> > > >> > That's enough for 1 email thread on this debug topic. > >> > > >> > > >> > ________________________________ > >> > From: Steve Lawrence <slawre...@apache.org> > >> > Sent: Tuesday, January 5, 2021 2:26 PM > >> > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > >> > Subject: The future of the daffodil DFDL schema debugger? > >> > > >> > > >> > Now that we're in a new year, I'd like to start a discussion about the > >> > Daffodil DFDL Schema debugger and how it might be improved to be more > >> > useful. > >> > > >> > Note that this is not the capabilities to debug Daffodil itself in > >> > something like Eclipse/IntelliJ, but the ability for Daffodil to > provide > >> > enough extra information during a parse/unparse so that a schema > >> > developer can get an idea of what Daffodil is doing. This makes it > >> > easier for users (rather than developers) to determine why a schema > >> > isn't giving the expect parse/unparse result (either because of bad > data > >> > or a faulty schema. > >> > > >> > The current state of the debugger is enabled by providing the --debug > or > >> > --trace flags in the CLI. More information about that here: > >> > > >> > https://daffodil.apache.org/debugger/ > >> > > >> > This enables a TUI and commands somewhat similar to GDB, providing > thins > >> > like breakpoints, steps, displaying the current infoset, display a > dump > >> > of the data, etc. > >> > > >> > Although I find this tool pretty useful, it definitely has some > glaring > >> > issues. > >> > > >> > The most glaring to me is that it really isn't useful at all for > >> > debugging unparse. The data dumps only include then main outputstream, > >> > so determine things like suspensions and buffered output is > impossible. > >> > > >> > Another issue is the infoset output. When outputting the infoset, the > >> > debugger currently just walks the entire thing and converts it to XML > >> > and displays the XML. For large infosets, this is excess and can make > it > >> > impossible to use, even with some configurations the limit how much of > >> > that infoset is actually printed to the screen. Also things like large > >> > hex binary blobs create excessive and unusable output. > >> > > >> > Another thing I feel is missing is a schema view. Right now it's very > >> > difficult to know where in the schema Daffodil actually is. > >> > > >> > I think these issues just need some thought improvement. One could > >> > imagine a better way to stringify our unparse buffers for debug. One > >> > could image a way to receive infoset state changes so the debugger can > >> > track things like backtracks and remove infosets. One could image a > way > >> > display the schema > >> > > >> > We just need a better way to stringify the current state of the > unparse > >> > data including buffers, and we need a way to for the debugger to > receive > >> > state change information about infoset so it can update displays > rather > >> > than just constantly printing the entire infoset. > >> > > >> > However, I think another other big issue is just usability in > general. I > >> > think the CLI usage is reasonable, but it's not always user friendly, > >> > and is difficult to view multiple things at the same time. I think > >> > because of this very few people even use this tool. So this this like > >> > perhaps something worth focus. > >> > > >> > My first thought to improving this usability issue would be to > implement > >> > the Debug Adapter Protocol (DAP) > >> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil, > >> > which many IDE's implement. With this implemented, Daffodil could be > >> > plugged in to any IDE that supports it and essentially get debugging > for > >> > free, without the need to worry about the GUI elements. > >> > > >> > I do have concerns that this just wouldn't have enough functionality > >> > that we'd really need. For example, DAP really only has ability show > >> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to > >> > show a live view of the infoset or data. Most DAP IDE's do have a > >> > console output, so we could potentially make it so the console output > is > >> > a live view of infoset/data. But I'm not even sure most DAP friendly > >> > IDE's could support this kindof console output. Does anyone have > >> > familiarity with DAP IDE's or and what kinds of console capabilities > are > >> > available? > >> > > >> > I also looked into TUI libraries with the idea that we could just > extend > >> > our current debugger user interface to be a bit friendlier. > >> > Unfortunately, there aren't too many Java/Scala TUI libraries and > those > >> > that do exist don't have Apache friendly licenses. We also want to be > >> > careful about increase dependencies just for a debugger than many > people > >> > might not use, so large graphics libraries are probably out of the > >> question. > >> > > >> > This allo makes me wonder if an approach worth taking for the future > of > >> > Daffodil schema debugging is developing a sort of "Daffodil Debug > >> > Protocol". I imagine it would be loosely based on DAP (which is > >> > essentially JSON message based) but could be targeted to the things > that > >> > a DFDL schema debugger would really need. An added benefit with some > >> > sort of protocol is the debugger interface can be uncoupled from > >> > Daffodil itself, so we could implement a TUI/GUI/whatever in any > >> > language/GUI framework and just have it communicate the protocol over > >> > some form of IPC. Another benefit is that any future backends could > >> > implement this protocol and so a single debugger could hook into > >> > different backends without much issue. Unfortunately, defining such a > >> > protocol might be a large task, but we do have our existing debug > >> > infrastructure and things like DAP to guide its development/design. > >> > > >> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps > we > >> > really just need the few improvements mentioned to the existing > >> > debugger. Is that enough to make it usable? Or is an entirely > different > >> > approach needed to debugging schemas? > >> > > >> > >> >