I really haven't spent enough time looking through the object model, but in terms of usage, it seems like it would be nice if each source/sink/decorator had a consistent way of documenting it's intended usage in the code. While it would be easy to propose adding a getUsage() method to the sink interface I think that would be entirely inappropriate (the usage doc is more an aspect of command line configurability than the function of the sink itself). I'm not sure if there is an appropriate hook on something like a "configurable" interface that the source/sink/decos all happen to be using, but if so, that would be a nice place to put a consistent usage method.
For one thing, it would be nice if you could query the shell for usage information rather than having to cause an error in the hopes of getting it -- imagine being able to call flume shell -e "usage collectorSink" for any function and having it spit back the appropriate usage information. Of course to do something like that you'd probably need a generic FunctionBuilder which was across the board to build functions, sinks, sources, etc (when they came from a function rather than some other syntax). In that case I could picture a config file that mapped function names to classes -- that same config file would be idea for putting usage doc because you'd have a consolidated place containing doc for all the functions rather than having to look in each class. Then as long as the FuntionBuilder passed that standardized usage information into the the function that it built, the function could spit out the consistent usage information as part of an exception message when the usage specs were violated -- ideally with a nice additional message describing what the particular violation was. As for what notation to use in describing the usage spec... I imagine square brackets would be fine as long as they were accompanied by some examples demonstrating that brackets are not actually part of the usage. For instance: Usage: functionName[(arg1,arg2)] arg1 optionally specifies the number of times you want to.... arg2 optionally specifies the type of .... expample: functionName //call the function without any arguments functionName(1) //call function with arg1 as 1 but allowing arg2 to default functionName(arg2="bla") // specify arg2, but allow arg1 to default I realize this is kind of an ambitious proposal, but it sure would be nice =) On Sat, Sep 10, 2011 at 3:22 PM, Jonathan Hsieh <[email protected]> wrote: > Jeff, > Thanks for digging into this. I'm at the point where this syntax "feels > natural" to me so I'd love you hear you opinion on how to improve the > documentation to make it easier to learn and understand. > I could see how we could improve in the manual be being more explicit than > the terse syntax info. What do you think that make sense for he usage > warnings? > Thanks, > Jon. > > On Thu, Sep 8, 2011 at 2:14 PM, Jeff Hansen <[email protected]> wrote: >> >> I think I finally get the syntax for the most part -- but it took >> finding and skimming through a book on ANTLR to figure it out. The >> pertinent information is in FlumeDeploy.g >> >> It seems to me that the syntax is relatively straight forward once you >> get that sinks, sources and decorators are all just functions and they >> all follow the same lexical pattern as functions. Unfortunately the >> "function" syntax is the one all important pattern that's missing from >> the documentation (or if it's there it's hidden behind a jedi master >> saying "these are not the specs you are looking for"). >> >> >> The key is explaining that function calls follow a syntax slightly >> more like that of Ruby or Python than that of Java in that the >> parentheses for arguments are optional -- except that they aren't >> exactly optional because they're required if you actually want to pass >> any arguments to the function. >> >> Then it's just a matter of explaining that arguments are themselves >> either functions or literals (string, numeric, boolean). Further, >> required arguments always come first and they may be followed by >> optional arguments which (much like in ruby and python) can be passed >> in as named arguments where argName=argValue -- this allows you to >> skip over arguments you don't want to override if they happen to come >> before arguments you do want to override. >> >> >> Personally I'd avoid explaining any of this optionality with square >> brackets, because square brackets are significant characters that show >> up elsewhere (fan-out sources). In some cases it's relatively clear to >> me that brackets indicate an argument is optional -- for instance >> functionName(arg1[,arg2]*) is clear to me, but "Usage: >> functionName[(arg1,arg2)]" in an error message telling me I've done >> something wrong just makes me think, crap, did I need to put brackets >> in there? >> >> Does anybody else think that kind of explanation would have been >> helpful when you were starting out? >> >> Thanks, >> Jeff > > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // [email protected] > >
