> Next step is to refine these thoughts with a prototype. Another next step is to collect feedback on this research and proposed approach. Any discussion is appreciated.
On Tue, Apr 20, 2021 at 10:00 AM John Wass <jwa...@gmail.com> wrote: > > Going to look deeper into how DAP might fit with Daffodil > > Have been looking over DAP and getting a good feeling about it. The > specification [1] seems general enough that it could be applied to Daffodil > and cover a swath of common operations (like start, stop, break, continue, > code locations, variables, etc). > > There are many areas though that are unique to Daffodil that have no > representation in the spec. These things (like InputStream, Infoset, PoU, > different variable types, backtracking, etc) will need an extension to > DAP. This really boils down to defining these things to fit under the DAP > BaseProtocol and enabling handling of those objects on both the front and > back ends. > > On the backend we need a Daffodil DAP protocol server. Existing JVM > implementations (like Java [2], Scala [3]) are tied closely to JDI and > would bring a lot of extra baggage to work around that. Developing a > Daffodil specific implementation is no small task, but feasible. There are > a several existing implementations on the JVM that are close and can be > looked at for reference. > > The backend implementation would look similar to what was described in an > earlier post. We could use ZIO/Akka/etc to implement the backend Protocol > Server to enable the IO between the Daffodil process and the DAP clients. > This implementation would now be guided by the DAP specification. > > With the protocol and backend extended to fit Daffodil that leaves the > frontend. In theory an existing IDE plugin should get pretty close to > being able to perform the common debug operations mentioned above. To > support the Daffodil extensions there will need to be handling of the > extended protocol into whatever views are desired/applicable. > > > Also looking into the Java Debug Interface (JDI) for comparison. > > JDI appears to be the wrong level of abstraction for what we are talking > about in debugging Daffodil for schema development. While DAP does do JVM > debugging (through a JDI DAP impl) it also generalizes to many other > debugging scenarios. JDI on the other hand is very tied to the JVM. > > Extending the JDI appears to be more complex than dealing with DAP, and > even though the JDI API is mostly defined with interfaces, there are choke > points that limit to JVM concepts. For example jdi.Value has a finite set > of JVM types that it works with, its not clear where Daffodil types would > plugin if even possible. > > The final note is that unique Daffodil features wouldn’t get to IDE > support any faster JDI. In some cases, like VS Code, you would still need > an extended DAP to support these features. > > > and depending on how it shakes out will update the example to show > integration > > It would appear wise to investigate DAP further. Next step is to refine > these thoughts with a prototype. I started an implementation in the example > debugger project [4] to try to run the current example on a _minimal_ DAP > implementation. > > > [1] https://microsoft.github.io/debug-adapter-protocol/specification > [2] https://github.com/Microsoft/java-debug > [3] https://github.com/scalacenter/scala-debug-adapter > [4] https://github.com/jw3/example-daffodil-debug > > > On Mon, Apr 12, 2021 at 9:58 AM John Wass <jwa...@gmail.com> wrote: > >> > the code is here https://github.com/jw3/example-daffodil-debug >> >> There is now a complete console based example for Zio that demonstrates >> controlling the debug flow while distributing the current state to three >> "displays". >> 1. infoset at current step >> 2. diff of infoset against previous step >> 3. bit position and value of data. >> >> These displays are very rudimentary but demonstrate the ability to >> asynchronously populate multiple views while synchronously controlling the >> debug loop. >> >> > - The new protocol being informed by existing debugger and DAPis key >> >> Going to look deeper into how DAP might fit with Daffodil, and depending >> on how it shakes out will update the example to show integration. >> >> Some interesting links to start with >> - https://github.com/scalacenter/scala-debug-adapter >> - >> https://scalameta.org/metals/docs/integrations/debug-adapter-protocol.html >> - https://github.com/microsoft/java-debug >> >> Also looking into the Java Debug Interface (JDI) for comparison. >> >> >> On Thu, Apr 8, 2021 at 12:36 PM John Wass <jwa...@gmail.com> wrote: >> >>> Revisiting this post after doing some debugger related work and thinking >>> about debug protocol/adapters to connect external tooling to the debug >>> process. >>> >>> This comment is good >>> >>> > This allo makes me wonder if an approach worth taking for the future >>> of Daffodil schema debugging is developing a sort of "Daffodil Debug >>> Protocol". I imagine it would be loosely based on DAP (which is >>> essentially JSON message based) but could be targeted to the things that a >>> DFDL schema debugger would really need. An added benefit with some sort of >>> protocol is the debugger interface can be uncoupled from Daffodil >>> itself, so we could implement a TUI/GUI/whatever in any language/GUI >>> framework and just have it communicate the protocol over some form of >>> IPC. Another benefit is that any future backends could implement this >>> protocol and so a single debugger could hook into different backends >>> without much issue. Unfortunately, defining such a protocol might be a >>> large task, but we do have our existing debug infrastructure and things >>> like DAP to guide its development/design. >>> >>> Some thoughts on this >>> - Defining the protocol will be a large task, but a minimal version >>> should get up and round tripping quickly with a minimal subset of the >>> protocol. >>> - The new protocol being informed by existing debugger and DAPis key >>> - Uncoupling from Daffodil is key >>> - Adapt the Daffodil protocol to produce DAP after the fact so as not to >>> constrain Daffodil debugging capability >>> - We dont need to tie the protocol or adapters to a single framework, >>> implementations of the IO layer should be simple enough to support multiple >>> things (eg Akka, Zio, "basic" ...) >>> - The current debugger lives in runtime1, but can we make an abstract >>> API that any runtime would implement? >>> >>> Maybe a solution is structured like this >>> - daffodil-debug-api: >>> - protocol model >>> - interfaces: debugger / IO adapter / etc >>> - lives in daffodil repo (new subproject?) >>> - daffodil-debug-io-NAME >>> - provides implementation of a specific IO adapter >>> - multiple projects possible (daffodil-debugger-akka, >>> daffodil-debugger-zio, etc) >>> - supported ones live in their own subprojects, but other can be >>> plugged in from external sources >>> - ability to support multiple implementations reduces risk of lock-in >>> - debugger applications >>> - maintained in external repositories >>> - depending on the IO implementation these could execute be in >>> separate process or on separate machine >>> - like Steve said, could be any language / framework >>> >>> Three types of reference implementations / sample applications could >>> also guide the development of the API >>> 1. a replacement for the existing TUI debugger, expected to end up >>> with at minimum the same functionality as the current one. >>> 2. a standalone GUI (JavaFX, Scala.js, ..) debugger >>> 3. an IDE integration >>> >>> Thoughts? >>> >>> Also I'm working on some reference implementations of these concepts >>> using Akka and Zio. Not quite ready to talk through it yet, but the code >>> is here https://github.com/jw3/example-daffodil-debug >>> >>> >>> >>> On Wed, Jan 6, 2021 at 1:42 PM Steve Lawrence <slawre...@apache.org> >>> wrote: >>> >>>> Yep, something like that seems very reasonable for dealing with large >>>> infosets. But it still feels like we still run into usability issues. >>>> For example, what if a user wants to see more? We need some >>>> configuration options to increase what we've ellided. It's not big, but >>>> every new thing that needs configuration adds complexity and decreases >>>> usability. >>>> >>>> And I think the only reason we are trying to spend effort elliding >>>> things is because we're limited to this gdb-like interface where you can >>>> only print out a little information at a time. >>>> >>>> I think what would really is to dump this gdb interface and instead use >>>> multiple windows/views. As a really close example to what I imagine, I >>>> recently came across this hex editor: >>>> >>>> https://www.synalysis.net/ >>>> >>>> The screenshots are a bit small so it's not super clear, but this tool >>>> has one view for the data in hex, and one view for a tree of parsed >>>> results (which is very similar to our infoset). The "infoset" view has >>>> information like offset/length/value, and can be related back to the >>>> data view to find the actual bits. >>>> >>>> I imagine the "next generation daffodil debugger" to look much like >>>> this. As data is parsed, the infoset view fills up. This view could act >>>> like a standard GUI tree so you could collapse sections or scroll around >>>> to show just the parts you care about, and have search capabilities to >>>> quickly jump around. The advantage here is you no longer really need >>>> automated eliding or heuristics for what the user *might* care about. >>>> You just show the whole thing and let user scroll around. As daffodil >>>> parses and backtracks, this tree grows or shrinks. >>>> >>>> I also imagine you could have a cursor moving around the hex view, so as >>>> daffodil moves around (e.g. scanning for delimiters, extracting >>>> integers), one could update this data view to show what daffodil is >>>> doing and where it is. >>>> >>>> I also image there could be other views as well. For example, a schema >>>> view to show where in the schema daffodil is, and to add/remove >>>> breakpoints. And an information view for things like variables, in-scope >>>> delimiters, PoU's, etc. >>>> >>>> The only reason I mention a debug protcol is that would allow this GUI >>>> to be more easily written in something other that Java/Scala to take >>>> advantage of other GUI toolkits. It's been a long while since I've done >>>> anything with Java guis, but they seems pretty poor that last I looked >>>> at them. Would even allow for a TUI, which Java has little/no support >>>> for. Also enables things like remote deubgging if an socket IPC was >>>> used. Though I'm not sure all of that is necessary. Just thinking what >>>> would be ideal, and it can always be pared back. >>>> >>>> >>>> On 1/6/21 12:44 PM, Beckerle, Mike wrote: >>>> > I don't think of it as a daffodil debug protocol, but just a >>>> separation of concerns between display of information and the behaviors of >>>> parse/unparse that need to be points where users can pause, and data >>>> structures available to display. >>>> > >>>> > E.g., it is 100% a display issue that the infoset (shown as XML) is >>>> clumsy, too big, etc. The infoset is available in the processor state, and >>>> one can examine the current node, enclosing node, prior sibling(s), >>>> following sibling(s), etc. One can elide contents that are too big for >>>> hexBinary, etc. >>>> > >>>> > I think this problem, how to display the infoset with sensible limits >>>> on sizing, is fairly easy to come up with some design for, that will at >>>> least be (1) always fairly small (2) much more useful in more cases. It >>>> won't be perfect but can be much better than what we do now. >>>> > >>>> > One sensible display "mode" should be that displaying the context >>>> surrounding the current element (when parsing or unparsing) displays at >>>> most N-lines. (N/2 before, N/2 after) with a maximum length of L characters >>>> (settable within reason ?) >>>> > >>>> > Sibling and enclosing nodes would be displayed eliding their contents >>>> to at most 1 line. >>>> > >>>> > Here's an example of what I mean. Displaying up to M=10 lines total: >>>> > >>>> > ... >>>> > <enclosingParent1> >>>> > ... >>>> > <priorSibling2>89ab782 ...</...> >>>> > <priorSibling1>some text is here and some more text</...> >>>> > <currentNode>value might be some big thing which needs to be >>>> elided ...</...> >>>> > <followingSibling1> ... </...> >>>> > ??? >>>> > </enclosingParent1> >>>> > ??? >>>> > >>>> > The </...> is just an idea to reduce XML matching end-tag clutter. >>>> > >>>> > The ... on a line alone or where element content would appear >>>> generally means 1 or more other siblings. The way the display above starts >>>> with ... means that this is a relative inner nest, not starting from the >>>> absolute root. >>>> > >>>> > The ... within simple content means that content is elided to fit on >>>> one line. Always follows some text characters to differentiate from the >>>> child-element context. >>>> > >>>> > The ??? means zero or more other siblings. >>>> > >>>> > I used bold italic above to point out that the current node would be >>>> highlighted somehow. Probably a way to do this that doesn't require display >>>> modes would be useful. E.g., a text marker like ">>>" as in: >>>> > >>>> >>>> <currentNode>value .... </...> >>>> > >>>> > might be better, particularly for a trace output being dumped to a >>>> text file. >>>> > >>>> > I made the above example an unparser kind of example by showing a >>>> following sibling that exists that is after the current node. >>>> > >>>> > I think the key concept is that any sibling node is displayed in a >>>> way that fits on one line. >>>> > E.g., even if the element name was really long, I'd suggest: >>>> > >>>> > <hereIsAnElementWithASuperLongName...>abcd ... </...> >>>> > >>>> > Where the element name itself gets elided because it is too long. >>>> > >>>> > A thought. Note that the above presentation is shown as quasi-XML, >>>> but there's nothing XML-specific about it. A JSON-friendly equivalent could >>>> be done as well: >>>> > >>>> > enclosingParent1 = { >>>> > ... >>>> > priorSibling2 = "89ab782..." >>>> > priorSibling1 = "some text is here and some more text" >>>> > currentNode = "value might be some big thing which needs to be >>>> elided ..." >>>> > followingSibling1 = { ... } >>>> > ??? >>>> > } >>>> > >>>> > That's enough for 1 email thread on this debug topic. >>>> > >>>> > >>>> > ________________________________ >>>> > From: Steve Lawrence <slawre...@apache.org> >>>> > Sent: Tuesday, January 5, 2021 2:26 PM >>>> > To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>> > Subject: The future of the daffodil DFDL schema debugger? >>>> > >>>> > >>>> > Now that we're in a new year, I'd like to start a discussion about the >>>> > Daffodil DFDL Schema debugger and how it might be improved to be more >>>> > useful. >>>> > >>>> > Note that this is not the capabilities to debug Daffodil itself in >>>> > something like Eclipse/IntelliJ, but the ability for Daffodil to >>>> provide >>>> > enough extra information during a parse/unparse so that a schema >>>> > developer can get an idea of what Daffodil is doing. This makes it >>>> > easier for users (rather than developers) to determine why a schema >>>> > isn't giving the expect parse/unparse result (either because of bad >>>> data >>>> > or a faulty schema. >>>> > >>>> > The current state of the debugger is enabled by providing the --debug >>>> or >>>> > --trace flags in the CLI. More information about that here: >>>> > >>>> > https://daffodil.apache.org/debugger/ >>>> > >>>> > This enables a TUI and commands somewhat similar to GDB, providing >>>> thins >>>> > like breakpoints, steps, displaying the current infoset, display a >>>> dump >>>> > of the data, etc. >>>> > >>>> > Although I find this tool pretty useful, it definitely has some >>>> glaring >>>> > issues. >>>> > >>>> > The most glaring to me is that it really isn't useful at all for >>>> > debugging unparse. The data dumps only include then main outputstream, >>>> > so determine things like suspensions and buffered output is >>>> impossible. >>>> > >>>> > Another issue is the infoset output. When outputting the infoset, the >>>> > debugger currently just walks the entire thing and converts it to XML >>>> > and displays the XML. For large infosets, this is excess and can make >>>> it >>>> > impossible to use, even with some configurations the limit how much of >>>> > that infoset is actually printed to the screen. Also things like large >>>> > hex binary blobs create excessive and unusable output. >>>> > >>>> > Another thing I feel is missing is a schema view. Right now it's very >>>> > difficult to know where in the schema Daffodil actually is. >>>> > >>>> > I think these issues just need some thought improvement. One could >>>> > imagine a better way to stringify our unparse buffers for debug. One >>>> > could image a way to receive infoset state changes so the debugger can >>>> > track things like backtracks and remove infosets. One could image a >>>> way >>>> > display the schema >>>> > >>>> > We just need a better way to stringify the current state of the >>>> unparse >>>> > data including buffers, and we need a way to for the debugger to >>>> receive >>>> > state change information about infoset so it can update displays >>>> rather >>>> > than just constantly printing the entire infoset. >>>> > >>>> > However, I think another other big issue is just usability in >>>> general. I >>>> > think the CLI usage is reasonable, but it's not always user friendly, >>>> > and is difficult to view multiple things at the same time. I think >>>> > because of this very few people even use this tool. So this this like >>>> > perhaps something worth focus. >>>> > >>>> > My first thought to improving this usability issue would be to >>>> implement >>>> > the Debug Adapter Protocol (DAP) >>>> > (https://microsoft.github.io/debug-adapter-protocol/) for Daffodil, >>>> > which many IDE's implement. With this implemented, Daffodil could be >>>> > plugged in to any IDE that supports it and essentially get debugging >>>> for >>>> > free, without the need to worry about the GUI elements. >>>> > >>>> > I do have concerns that this just wouldn't have enough functionality >>>> > that we'd really need. For example, DAP really only has ability show >>>> > code (Daffodil's equivalent is the DFDL schema). There isn't a way to >>>> > show a live view of the infoset or data. Most DAP IDE's do have a >>>> > console output, so we could potentially make it so the console output >>>> is >>>> > a live view of infoset/data. But I'm not even sure most DAP friendly >>>> > IDE's could support this kindof console output. Does anyone have >>>> > familiarity with DAP IDE's or and what kinds of console capabilities >>>> are >>>> > available? >>>> > >>>> > I also looked into TUI libraries with the idea that we could just >>>> extend >>>> > our current debugger user interface to be a bit friendlier. >>>> > Unfortunately, there aren't too many Java/Scala TUI libraries and >>>> those >>>> > that do exist don't have Apache friendly licenses. We also want to be >>>> > careful about increase dependencies just for a debugger than many >>>> people >>>> > might not use, so large graphics libraries are probably out of the >>>> question. >>>> > >>>> > This allo makes me wonder if an approach worth taking for the future >>>> of >>>> > Daffodil schema debugging is developing a sort of "Daffodil Debug >>>> > Protocol". I imagine it would be loosely based on DAP (which is >>>> > essentially JSON message based) but could be targeted to the things >>>> that >>>> > a DFDL schema debugger would really need. An added benefit with some >>>> > sort of protocol is the debugger interface can be uncoupled from >>>> > Daffodil itself, so we could implement a TUI/GUI/whatever in any >>>> > language/GUI framework and just have it communicate the protocol over >>>> > some form of IPC. Another benefit is that any future backends could >>>> > implement this protocol and so a single debugger could hook into >>>> > different backends without much issue. Unfortunately, defining such a >>>> > protocol might be a large task, but we do have our existing debug >>>> > infrastructure and things like DAP to guide its development/design. >>>> > >>>> > Thoughts? Does such a Daffodil Debug Protocol seem worth it? Perhaps >>>> we >>>> > really just need the few improvements mentioned to the existing >>>> > debugger. Is that enough to make it usable? Or is an entirely >>>> different >>>> > approach needed to debugging schemas? >>>> > >>>> >>>>