Hi, Vojtech,

On Thu, Jun 28, 2012 at 06:24:44PM +0200, Vojtech Horky wrote:
> 2012/6/28 Sean Bartell <[email protected]>:
> > I've finished implementing the first iteration of the Bithenge script
> > language.
> I decided to test it with something more realistic - see the
> attachments, please.
> 
> My example is parsing of USB descriptors. I do not know how much you
> know USB internals, but briefly USB descriptors describe type and
> capabilities of the USB device and they form a tree. Current version
> of Bithenge is not able to parse "any" descriptor, but for a fixed
> configuration, it works well.
> 
> The attached example usbkbd.dat is a configuration (including
> endpoints) descriptor of USB keyboard in QEMU and usbcfgdesc.bh
> contains the description. The descriptors use mostly uint8 and uint16,
> so I added other built-in transformations. It was pretty easy
> (copy-paste mostly) - feel free to use them. After that, it worked and
> it displayed parsed USB descriptor correctly :-).

Cool, I'll add these to my branch.

> > Now that the basics of Bithenge are in place, there are several things I
> > could start working on. I'm inclined to keep working on the language,
> > specifically by adding support for parameters and expressions, because
> Can you, please, elaborate more what you mean by parameters? As a
> matter of fact, there are some features described on the wiki (Future
> features section) that I am not sure I understand correctly. Would
> you, please, describe them in more detail? Or, please, give examples
> of file-formats/file-systems/whatever where this functionality would
> be useful.

I've rewritten that wiki section to explain the ideas. Parameters are a
way of letting a transform depend on more than just their input data.
For instance, a Pascal string could look like this:

  .len <- uint8le;
  .str <- ascii <- known_length(.len);

Where .len is given as a parameter to known_length. At first, parameters
will be limited to literals (8) or references to previous fields (.len),
but more complicated expressions could be added. Users could also add
parameters to the transforms they define. Many of the ideas I listed
would involve parameters somehow.

> > other aspects of Bithenge won't be very useful until the language is
> > more powerful. Does this seem like a good idea?
> From my point of view, following features shall be on top of your list.
> 
> Repetition with some kind of "stop condition" (something like "until
> this field is zero" would be probably enough). Use case is iterating
> through program stack.
> 
> Repetition of fixed length, where the length would be determined by
> some previously known "field". Example could be reports in USB HID
> descriptor or color palette in GIF.
> 
> Switch based on some magic value. For example, USB descriptor type is
> determined from the second byte. Or GIF extension blocks.
> 
> Bitmap parsing.

I've included these in the wiki page. The first three would involve
parameters.

> Skip some data. Either of fixed length (e.g. to skip unknown USB
> descriptors) or until some pattern is found (e.g. parts of GIF
> header).

These could be done with a combination of parameters (for known_length
or similar), the "hidden fields" idea, and the "search" idea.

> Accessing the parsed data at run-time. Currently, you are able to
> print the data but I found no way to actually work with them. For
> example, how would I read the latitude from your example once the
> structure is parsed? This is a very important feature if we want to
> use Bithenge as a basis for other tools, such as debugger. Optimal
> solution would be to parse the data into a native C struct. Question
> is how to populate it. Generate a setter of some kind from the
> grammar, use some weird macros, ...

The primary way of accessing the data is to use the functions in tree.h
to access the output tree. As you mention, I will need to add a method
to get a specific child of an internal node. This way is more flexible
because the script can be chosen at runtime, so it would be used for a
debugger or interactive browser.

Another way would be to parse the script file and create C source code
that declares structs and reads data. This would be an alternative to
writing the C code directly when you need to parse binary data in
HelenOS. I didn't include this in my GSoC plan because it wouldn't help
with the debugger or interactive browser.

> What is the "Try to read 5 bytes and fail if the blob is too long."
> comment? Could that happen if the length is explicitly given by the
> other function?

uint32le_apply needs to check whether the input node is a 4-byte blob
node. It does this by trying to read 5 bytes from the blob node. If it
succeeds and gets 5 bytes, it knows the blob node is too long for a
uint32le, so it returns EINVAL. If it gets 4 bytes, it knows the blob
node is exactly 4 bytes long.

prefix_length_4 says that uint32les only take 4-byte inputs, but it's
only a helper. If your script is:

  transform main = uint32le;

prefix_length_4 is never called; uint32le_apply is called directly on
the input, and it's responsible for checking whether the input has the
correct length.

> I see that you wrote the parser & lexer all by yourself. Was there any
> specific reason for not using Flex/bison/whatever? Currently, the
> language is pretty simple but this burden may strike back later...

No specific reason, although they might require libposix. Maybe this
will be good reason to prevent the language from becoming convoluted.

> Do not forget that result shall be a library. Currently, everything
> resides in app/.

I will move it.

> I had problems orienting myself in the dump, so I added a very
> primitive indenting - it is attached for inspiration how the output
> could look like. The implementation is a 2-minute hack that shall
> never be committed ;-).

I'll work on indentation support. Printing will also need improvement to
be used in the interactive browser.


> What are the plans for "write" support?

> Your progress looks okay to me, but do not forget that you are already
> in the middle of GSOC (assuming you won't work during your vacation).
> If you think that you will not be able to implement everything, it is
> time to discuss what features could be dropped. Otherwise, good job!

I'll try to work a little, but this is still a good point. In my
proposal, I planned to make these things by the GSoC deadline:

- An interactive browser (7 days)

- A simple DWARF implementation to browse other tasks' memory (9 days)

- Editing support, including interactive editing (11 days)

I think I could do these, but with very limited functionality. For
instance, the editing support might only work with simple integers and
booleans. I propose not working on editing yet, and maybe not DWARF
either; instead, I could improve the language and spend more time
working on the interactive browser. Of course, the plan could be changed
later if necessary. What is your preference?

Thanks,
Sean

_______________________________________________
HelenOS-devel mailing list
[email protected]
http://lists.modry.cz/cgi-bin/listinfo/helenos-devel

Reply via email to