[ 
https://issues.apache.org/jira/browse/PIG-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711632#comment-13711632
 ] 

Jonathan Packer commented on PIG-3359:
--------------------------------------

Cheolsoo, I updated patch to address your comments in RB. Johnny, my replies to 
your questions are at the bottom of this comment.

Cheolsoo,

There was a comment where you noted that ParamLoader was being called 
repeatedly. To address this, and a lot of other messiness with param 
substitution, I did a refactoring so all methods that substitute parameters 
call PigContext's doParamSubstitution(), which instantiates and manages 
ParamSubstitutionPreprocessor. This ensures consistent behavior between the 
param substitution called in Main, GruntParser, and several other places, and 
solves the ParamLoader problem. Managed by PigContext, the loading of 
parameters is separated from substitution itself, is not repeated when 
substitution is done many times (i.e. for macros).

As for registerJars and registerCode, I got stuck. I understand your concern 
about separating parser and server. But QPD returns a LogicalPlan, and I'm 
pretty sure that the LP generation needs the UDF output schemas to be defined, 
which requires the UDFs to be registered. I think this could be fixed by 
splitting the AST generation from the LP generation and allowing PigServer to 
register resources requested by macros in between the two. But that seems like 
a big change and I don't want the scope of this patch to expand too much.

What do you think?

Johnny,

To answer your questions:
1) The query parser seems to be reparsing everything every time it reads a new 
line of Pig code. So the same macros are parsed over and over. This patch 
caches the AST's from the first parsing, so when a new line of the Pig script 
is parsed, the macro is not re-parsed. Also in terms of actually fetching the 
files, the import sequence goes "test.pig" imports "macros_1.pig" imports 
"macros_2.pig"; when "test.pig" imports "macros_2.pig" in the next line, it has 
already been acquired, so it is not loaded. Same idea for udfs, jars. I'm not 
sure how best to test this within Pig without a mocking framework to intercept 
when QPD parses a macro and ensure it is not doing it repeatedly. Any ideas?

2) It should not, since param substitution in the main pigscript happens before 
macro expansion. This is just to propagate the parameters to imported macro 
files.

3) duplicatedImportTest is not supposed to fail; the duplicated import should 
just be ignored. This is silly within one pigscript, but the it makes sense in 
the test.pig/macros_1.pig/macros_2.pig example I described above, since both 
test and macros_1 should be able to import macros_2.
                
> Register Statements and Param Substitution in Macros
> ----------------------------------------------------
>
>                 Key: PIG-3359
>                 URL: https://issues.apache.org/jira/browse/PIG-3359
>             Project: Pig
>          Issue Type: Bug
>          Components: parser
>            Reporter: Jonathan Packer
>            Assignee: Jonathan Packer
>         Attachments: PIG-3359_test.tar.gz, PIG-3359-v1.diff, 
> PIG-3359-v2.diff, PIG-3359-v3.diff
>
>
> There are some gaps in the functionality of macros that I've made a patch to 
> address. The goal is to provide everything you'd need to make reusable 
> algorithms libraries.
> 1. You can't register udfs inside a macro
> 2. Paramater substitutions aren't done inside macros
> 3. Resources (including macros) should not be redundantly acquired if they 
> are already present.
> Rohini's patch https://issues.apache.org/jira/browse/PIG-3204 should address 
> problem 3 where Pig reparses everything every time it reads a line, but there 
> still would be a problem if two separate files import the same macro / udf 
> file.
> To get this working, I moved methods for registering jars/udfs and param 
> substitution from PigServer to PigContext so they can be accessed in 
> QueryParserDriver which processes macros (QPD was already passed a PigContext 
> reference). Is that ok?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to