[
https://issues.apache.org/jira/browse/PIG-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13711632#comment-13711632
]
Jonathan Packer commented on PIG-3359:
--------------------------------------
Cheolsoo, I updated patch to address your comments in RB. Johnny, my replies to
your questions are at the bottom of this comment.
Cheolsoo,
There was a comment where you noted that ParamLoader was being called
repeatedly. To address this, and a lot of other messiness with param
substitution, I did a refactoring so all methods that substitute parameters
call PigContext's doParamSubstitution(), which instantiates and manages
ParamSubstitutionPreprocessor. This ensures consistent behavior between the
param substitution called in Main, GruntParser, and several other places, and
solves the ParamLoader problem. Managed by PigContext, the loading of
parameters is separated from substitution itself, is not repeated when
substitution is done many times (i.e. for macros).
As for registerJars and registerCode, I got stuck. I understand your concern
about separating parser and server. But QPD returns a LogicalPlan, and I'm
pretty sure that the LP generation needs the UDF output schemas to be defined,
which requires the UDFs to be registered. I think this could be fixed by
splitting the AST generation from the LP generation and allowing PigServer to
register resources requested by macros in between the two. But that seems like
a big change and I don't want the scope of this patch to expand too much.
What do you think?
Johnny,
To answer your questions:
1) The query parser seems to be reparsing everything every time it reads a new
line of Pig code. So the same macros are parsed over and over. This patch
caches the AST's from the first parsing, so when a new line of the Pig script
is parsed, the macro is not re-parsed. Also in terms of actually fetching the
files, the import sequence goes "test.pig" imports "macros_1.pig" imports
"macros_2.pig"; when "test.pig" imports "macros_2.pig" in the next line, it has
already been acquired, so it is not loaded. Same idea for udfs, jars. I'm not
sure how best to test this within Pig without a mocking framework to intercept
when QPD parses a macro and ensure it is not doing it repeatedly. Any ideas?
2) It should not, since param substitution in the main pigscript happens before
macro expansion. This is just to propagate the parameters to imported macro
files.
3) duplicatedImportTest is not supposed to fail; the duplicated import should
just be ignored. This is silly within one pigscript, but the it makes sense in
the test.pig/macros_1.pig/macros_2.pig example I described above, since both
test and macros_1 should be able to import macros_2.
> Register Statements and Param Substitution in Macros
> ----------------------------------------------------
>
> Key: PIG-3359
> URL: https://issues.apache.org/jira/browse/PIG-3359
> Project: Pig
> Issue Type: Bug
> Components: parser
> Reporter: Jonathan Packer
> Assignee: Jonathan Packer
> Attachments: PIG-3359_test.tar.gz, PIG-3359-v1.diff,
> PIG-3359-v2.diff, PIG-3359-v3.diff
>
>
> There are some gaps in the functionality of macros that I've made a patch to
> address. The goal is to provide everything you'd need to make reusable
> algorithms libraries.
> 1. You can't register udfs inside a macro
> 2. Paramater substitutions aren't done inside macros
> 3. Resources (including macros) should not be redundantly acquired if they
> are already present.
> Rohini's patch https://issues.apache.org/jira/browse/PIG-3204 should address
> problem 3 where Pig reparses everything every time it reads a line, but there
> still would be a problem if two separate files import the same macro / udf
> file.
> To get this working, I moved methods for registering jars/udfs and param
> substitution from PigServer to PigContext so they can be accessed in
> QueryParserDriver which processes macros (QPD was already passed a PigContext
> reference). Is that ok?
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira