[
https://issues.apache.org/jira/browse/AIRFLOW-595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16639433#comment-16639433
]
Iuliia Volkova commented on AIRFLOW-595:
----------------------------------------
[~hwbj], does it actual at version 1.10?
> PigOperator
> -----------
>
> Key: AIRFLOW-595
> URL: https://issues.apache.org/jira/browse/AIRFLOW-595
> Project: Apache Airflow
> Issue Type: Improvement
> Components: operators
> Affects Versions: 1.7.1.3
> Reporter: wei.he
> Priority: Major
> Attachments: KKKKKKKKKKKKKKKK.jpg
>
>
> When I use the PigOperator, I happen to two issues.
> h3. 1. How should I add the "-param" to make the pig script run dynamically.
> For example,
> I hope that it run like *pig -Dmapreduce.job.name=test
> -Dmapreduce.job.queuename=mapreduce -param input=/tmp/input/* -param
> output=/tmp/output -f test.pig*
> {code:title=test.pig|borderStyle=solid}
> run -param log_name=raw_log_bid -param src_folder='${input}' load.pig;
> A = FOREACH raw_log_bid GENERATE ActionId AS ActionId;
> A = DISTINCT A;
> C = GROUP A ALL;
> D = foreach C GENERATE COUNT(A);
> STORE D INTO '$output';
> {code}
> {code:title=task.pyp|borderStyle=solid}
> ...
> task_pigjob = PigOperator(
> task_id='task_pigjob',
> pigparams_jinja_translate=True,
> pig='test.pig',
> #pig=templated_command,
> dag=dag)
> ....
> {code}
> How can I set the "param" ?
> h3. 2. After I add the ConnId "pig_cli_default" for PigHook and set the
> Extra to {"pig_properties": "-Dpig.tmpfilecompression=true"}, I run test
> dag.
> I got the following log .
> {quote}
> INFO - pig -f /tmp/airflow_pigop_c83K9T/tmpqw5on_
> -Dpig.tmpfilecompression=true
> [2016-10-25 17:32:44,619] {ipy_pig_hook.py:105} INFO - any environment
> variables that are set by the pig command.
> [2016-10-25 17:32:44,901] {models.py:1286} ERROR -
> Apache Pig version 0.12.1.2.1.4.0-632 (rexported)
> compiled Jul 29 2014, 18:24:35
> USAGE: Pig [options] [-] : Run interactively in grunt shell.
> Pig [options] -e[xecute] cmd [cmd ...] : Run cmd(s).
> Pig [options] [-f[ile]] file : Run cmds found in file.
> options include:
> -4, -log4jconf - Log4j configuration file, overrides log conf
> -b, -brief - Brief logging (no timestamps)
> -c, -check - Syntax check
> -d, -debug - Debug level, INFO is default
> -e, -execute - Commands to execute (within quotes)
> -f, -file - Path to the script to execute
> -g, -embedded - ScriptEngine classname or keyword for the ScriptEngine
> -h, -help - Display this message. You can specify topic to get help for
> that topic.
> properties is the only topic currently supported: -h properties.
> -i, -version - Display version information
> -l, -logfile - Path to client side log file; default is current working
> directory.
> -m, -param_file - Path to the parameter file
> -p, -param - Key value pair of the form param=val
> -r, -dryrun - Produces script with substituted parameters. Script is not
> executed.
> -t, -optimizer_off - Turn optimizations off. The following values are
> supported:
> SplitFilter - Split filter conditions
> PushUpFilter - Filter as early as possible
> MergeFilter - Merge filter conditions
> PushDownForeachFlatten - Join or explode as late as possible
> LimitOptimizer - Limit as early as possible
> ColumnMapKeyPrune - Remove unused data
> AddForEach - Add ForEach to remove unneeded columns
> MergeForEach - Merge adjacent ForEach
> GroupByConstParallelSetter - Force parallel 1 for "group all"
> statement
> All - Disable all optimizations
> All optimizations listed here are enabled by default. Optimization
> values are case insensitive.
> -v, -verbose - Print all error messages to screen
> -w, -warning - Turn warning logging on; also turns warning aggregation off
> -x, -exectype - Set execution mode: local|mapreduce, default is mapreduce.
> -F, -stop_on_failure - Aborts execution on the first failed job; default
> is off
> -M, -no_multiquery - Turn multiquery optimization off; default is on
> -P, -propertyFile - Path to property file
> -printCmdDebug - Overrides anything else and prints the actual command
> used to run Pig, including
> any environment variables that are set by the pig
> command.
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/airflow/models.py", line 1245, in run
> result = task_copy.execute(context=context)
> File
> "/usr/lib/python2.7/site-packages/airflow/operators/ipy_pig_operator.py",
> line 56, in execute
> self.hook.run_cli(pig=self.pig)
> File "/usr/lib/python2.7/site-packages/airflow/hooks/ipy_pig_hook.py", line
> 109, in run_cli
> raise AirflowException(stdout)
> AirflowException:
> Apache Pig version 0.12.1.2.1.4.0-632 (rexported)
> compiled Jul 29 2014, 18:24:35
> {quote}
> This reason is to run the command like *pig -f test.pig
> -Dpig.tmpfilecompression=true*
> *Is it fixed in the latest version ?*
> !KKKKKKKKKKKKKKKK.jpg!
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)