Jonathan Wilkes <[email protected]> writes: > I cannot for the life of me understand the quote from djb starting, > "Don't parse." What is it he doesn't like, and how does his text0 > format keep him from doing what he doesn't like?
I can only speculate about the reasons but 'parsing', especially in the sense it is used here, namely, not to refer to 'parsing' (interpret a sequence of tokens according to the rules of some grammar), but to 'lexical analysis of some text' seems deceptively simple in C but it actually isn't and the standard C string support routines are virtually useless for this. Because it seems so simple, people tend to writing code which handles all input they considered to be valid but fails, often in serious ways, ie, causing invalid memory accesses, when encountering 'crafted' invalid input. If you want to do a lexical analyser in C, it will have to become a often complicated[*] finite state machine analysing the input character-by-character and writing one is (IMHO) a very tedious business as one proceeds in tiny steps towards a distant goal. Considering that "doing it right" is a lot of work and shortcuts tend to cause disaster, avoiding the problem completely seems like a smart move. OTOH, 'seems', because one purpose of a parser is to detect and reject invalid input. This means "don't parse" implies "don't do input valdiation", and while that's surely popular :->, it's usually not an option. But "avoid writing parsers where feasible" is IMHO a sound piece of advice. Eg, in case some sort of 'config file format' is needed, it's often possible to get by by writing a set of variable=value statements in Bourne shell syntax and replace the 'start the program' command with a shell script sourcing the config file and starting the real program with command-line arguments corresponding to the values from the config file: The shell already has a parser and since typical shells were written to perform adequately on computers with far less horsepower than a current-day smartphone, it's even going to be a 'fast' parser. This, of course, means one first has to get over "OMG!!1 Fork and exec!!2" which still serves as justification for writing a few hundredthousands of lines of C code in the quest for 'performance' but the already mentioned, relatively puny, large computers could fork and exec all day without their usability being seriously impaired so that's IMNSHO a red herring. [*] I once wrote a parser for SMTP headers which actually required an additional state-stack in order to be able to 'go back to where we were before encountering this'. _______________________________________________ Dng mailing list [email protected] https://mailinglists.dyne.org/cgi-bin/mailman/listinfo/dng
