Alan Gates
Tue, 16 Mar 2010 15:10:01 -0700
Alan. On Mar 15, 2010, at 5:28 PM, Dmitriy Ryaboy wrote:
Alan -- yeah, right now we use the rather brittle approach of namingconventions to do this. Something more template/macro-like would be better. Of course something like Piglet, or equivalents in other languages, can obviate the need for these constructs, and I am not entirely sure functions,loops, etc are something we want to get into reinventing. I guess thequestion becomes whether we want Pig Latin to be a first-class language that programmers write code in directly, or if we shift focus on building out the tooling for generating Pig scripts, and Pig Latin becomes something you dropinto for one-offs. -DOn Mon, Mar 15, 2010 at 4:02 PM, Alan Gates <ga...@yahoo-inc.com> wrote:In your example below how would the results of these load functions beaccessed in your main script?I certainly see the value of #include plus functions (or #define if youprefer). Without functions though you'll have namespace clashes (anyrelation names used in the imported files will be visible to other imported files and to the main script) and the user will have to know the name ofinput and output relations for the imported files so he can use it subsequently in his script. For example if you had a pig script that implemented a certain type of join: RETURN = join INPUT1 by $0, INPUT2 by $0Now the user has to know that INPUT1 and INPUT2 must be the names of his input relations and that the output relation will be named RETURN. This is also limited because we can't define which key(s) to do the join on. To make this useful we're going to want a macro or function ability so we can pass in names of inputs and other parameters (like which keys to join on),control the names of results, and have variable scoping.That said, I'm all for it. I think it would make Pig must more usable.Alan. On Mar 15, 2010, at 2:58 PM, Dmitriy Ryaboy wrote: Alan, this would be quite useful, as essentially this would allowdevelopers to create functions by writing them into separate pig scripts and combining them as necessary.For example we have code that auto-generates load statements with fairlycomplex schemas based on protocol buffers (see http://www.slideshare.net/hadoopusergroup/twitter-protobufs-and-hadoop-hug-021709 ). It would be very handy to be able to say something like #include common_jars.pig #include load_tweets.pig #include load_users.pig #include filter_nonenglish_tweets.pig #include geomap_users.pig .. etc .. -DOn Mon, Mar 15, 2010 at 2:23 PM, Alan Gates <ga...@yahoo-inc.com> wrote:On Mar 12, 2010, at 10:36 AM, hc busy wrote:Is there any work towards something like C languages '#include' in Pig?in grunt. I don't think we're opposed to a #include functionality, weMylarge pig script is actually developed separately in several smaller pig files. Individually the pig files do not run because they depend onprevious scripts, but logically they are separate because each step does something different.Currently the only thing existing along these lines is the exec commandjusthaven't done it. However, given that Pig doesn't have function calls,andpresumably each Pig Latin script is self contained, it isn't clear to mehow useful it will be. Alan.