Hello Jeffrey,

On Tue, Dec 15, 2015 at 5:01 PM, Jeffrey Goldberg <jeff...@goldmark.org> wrote:
> Hello,
>
> I work for a company that produces a consumer product for a number of 
> different operating systems and environments as well as running a web service 
> (written in Go).  Although I’ve been telling people to not use ad-hoc parsers 
> and  adding a new regex each time someone discovers a bug, it would be much 
> much easier to persuade/force developers to do things right if I can actually 
> offer them something concrete.
>
> Our needs are (mostly) textual. So we don’t need things like Hammer or Nail. 
> I’m really old and so I think of lex/yacc (or perhaps flex/bison), but I am 
> hoping for things that might be better suited to simple deterministic Context 
> Free Grammars. And I would make our developers happier (i.e., more willing to 
> comply) if the parser-generators produce code that they can link and use 
> easily.

I tend to favor the parser combinator approach, like Hammer, or my own
Rust library, nom ( https://github.com/Geal/nom ), because it goes
well with developers: a few functions that you assemble to get a
grammar. Usually, when you talk about lex and yacc, developers will
either think these tools are too complex, because they never used
them, or they will remember with dread their compilers 101 course.
It's a shame, but as argument against handwritten parsers, parser
combinators subjectively work better than lex and others.


>
> So what I need is help from the langsec community in selling doing things the 
> right way among the developers where I work. Me talking abstractly about 
> “write a grammar, generate a parser from that grammar, and validate with that 
> parser before doing anything else” would go much better if I could actually 
> show people how to do that and how it will make things easier for them. Also 
> me saying “I told you so” with every new input validation bug, is getting 
> tiresome.
>
> So I would like to write a grammar (of, say, the subset of RFC2822 email 
> addresses that we want to accept) once and offer a practical way to get from 
> that grammar to a validator for languages and development environments 
> including Go, JavaScript/TypeScript, C#, Objective-C, and (perhaps) Delphi.
>
> Go and JavaScript/Typescript would be the big sell. I realize that for some 
> environments we might just have to build from C and link from that.
>
> So what are the parser-generators y’all actually recommend to developers that 
> they can easily and practically use?
>
> Cheers,
>
> Jeffrey Goldberg
>


On the specific subject of email addresses, most attempts to validate
failed, since there are a lot of use cases people ignore. This is the
process I usually follow:
- parse as UTF-8 characters
- find the @, as Nils said
- look for common typo errors in the domain part, like gnail.com, and
warn the user (I assume you want to warn the user before sending the
email, since you require Go or JS)
- send the email

Whatever the case, the last step is the only one that really works for
validation, since an email you deem valid might not work for the
remote server.

If you need to completely validate an email (or anything else), I can
still recommend nom, since Rust is easy to call from Go (cf
https://github.com/medimatrix/rust-plus-golang ) or any other
language. Would that work for you?

Cheers,

Geoffroy
_______________________________________________
langsec-discuss mailing list
langsec-discuss@mail.langsec.org
https://mail.langsec.org/cgi-bin/mailman/listinfo/langsec-discuss

Reply via email to