I'm trying to create JSON files that define Amazon data pipeline jobs. We
have 40 or so different jobs that all need processing on EMR. There's a lot
of shared config between our tasks, with only the input paths and EMR step
definition changing per job. However, since a lot of tasks share the same
EMR config, I need the ability to reuse the same EMR config for multiple
tasks, while also having the flexibility to override it per task.
So in the example I first gave, there are 2 jobs using the same default EMR
config, and one that will use a custom config. As I say, it's basically a
factory pattern, with each job knowing which EMR config it needs, and the
whole thing being templated generically.
Here's a sample of the JSON template I'm trying to populate:
{% for definition in definitions %}
{
"id": "EmrActivityId_{{ loop.index }}",
"name": "EmrActivity_{{ definition.suite }}",
"precondition": {
"ref": "PreconditionId_{{ loop.index }}"
},
"runsOn": {
"ref": "EmrClusterId"
},
"type": "EmrActivity",
"myDate": "{{ date | default('#{format(minusDays(@scheduledStartTime, 1),
\'YYYY-MM-dd\')}') }}",
"step": "{{ {{ definition.step_name }}.step }}"
},
{
"id": "PreconditionId_{{ loop.index }}",
"name": "InputExistsPrecondition",
"s3Prefix": "s3://example-{{ env }}-data{{ definition.s3_precondition }}",
"type": "S3PrefixNotEmpty"
},
{% endfor %}
The rest of the template is identical for all jobs. So I'm really looking
for a way to be able to populate the step definition per job so I can
maximise reuse. As I say, there are around 40 of these particular tasks I
need to migrate.
On Monday, January 5, 2015 4:13:59 PM UTC, Brian Coca wrote:
>
> sounds like a very complicated process, what are you trying to do in
> the end? it is normally simpler with ansible, it is rare to need
> nested variable includes.
>
> On Mon, Jan 5, 2015 at 10:09 AM, Mark <[email protected]
> <javascript:>> wrote:
> > Thanks for your reply Michael. Can you think of any other way I can
> achieve
> > what I'm trying to? I have tried:
> >
> > - templating the vars file before loading it with include_vars, but
> jinja
> > complained about undefined variables
> > - doing a string replace into the vars file, but then the variables in
> the
> > default_step.yml file weren't interpolated when it was later loaded by
> > ansible
> > - trying to load the step by name in the json template I'm trying to
> create.
> > I.e. for each product/suite combination I added 'step: default_step',
> then
> > had another vars file containing all of the steps (e..g
> steps.default_step,
> > steps.custom_step) and then in my json template trying "{{ {{
> > definition.step }}.step }}", but jinja wouldn't parse that.
> >
> > I'm racking my brains but can't think how I can do this.
> >
> > Thanks
> >
> > On Monday, January 5, 2015 2:14:45 PM UTC, Michael DeHaan wrote:
> >>
> >> There is no facility for a variable file including another.
> >>
> >> Jinja2 is also not invoked when reading variable files at that time, so
> >> include won't help.
> >>
> >>
> >>
> >> On Mon, Jan 5, 2015 at 6:17 AM, Mark <[email protected]> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I'm trying to create some sort of factory pattern I guess in my vars
> >>> files for creating Amazon data pipeline jobs. What I want is to have a
> list
> >>> of dicts, and each one will contain a parameter ("step") that I want
> to be
> >>> different per item. However, since this value is quite large &
> complicated,
> >>> and will be used for 90% of the items, i don't want to have to
> duplicate
> >>> this value 40 or so times.
> >>>
> >>> So, is it possible to include a vars file in another one? I tried
> using
> >>> jinja2's "include" function, but it complained that certain variables
> >>> weren't defined because it was trying to resolve the variables in the
> >>> included file. In fact, those variables should be parsed later in the
> main
> >>> play, now when including one vars file into the main one. I can't use
> roles
> >>> because this is part of a larger pattern in which the main vars file
> is
> >>> loaded dynamically depending on another variable.
> >>>
> >>> Some examples might make clear what I mean.
> >>>
> >>> Here's my playbook, "create-job.yml":
> >>>
> >>> - name: "Create a data pipeline and definition for {{ product }} {{
> job
> >>> }}"
> >>> hosts: localhost
> >>> gather_facts: True
> >>> vars_files:
> >>> - "vars/pipelines/{{ group }}/env/{{ env }}.yml"
> >>> - "vars/pipelines/{{ group }}/{{ job }}.yml"
> >>>
> >>>
> >>> Here's my vars file, "job1.yml", for the "job1" job:
> >>>
> >>> template: multiple-emr
> >>> startTime: 03:00:00
> >>> definitions:
> >>> - product: web_v2
> >>> suite: websuite
> >>> {% include default_step.yml %} # how to include
> >>> "default_step.yml"?
> >>> - product: db2
> >>> suite: dbsuite
> >>> {% include default_step.yml %}
> >>> - product: custom
> >>> suite: customsuite
> >>> {% include custom_step.yml %}
> >>> ... x 40
> >>>
> >>>
> >>> And here's the contents of "default_step.yml":
> >>>
> >>> s3_precondition: "/raw/data/#{node.myDate}/{{ product }}/"
> >>> step:
> >>> - "s3://my-bucket/artifacts/emr-jar-2.1.1.jar"
> >>> - "com.example.SampleEMR"
> >>> - "-Dinput=s3n://example-{{ env }}-data{{ s3_precondition |
> >>> replace('node.', '') }}*{{ suite }}_#{myDate}.*.gz"
> >>> - "-Doutput=s3n://example-{{ env
> }}-data/intermediate/data/#{myDate}/{{
> >>> product }}/{{ suite }}/",
> >>> - "-DoutputFormat=json",
> >>> ...
> >>>
> >>>
> >>> How can I achieve this with ansible?
> >>>
> >>> Mark
> >>>
> >>> --
> >>> You received this message because you are subscribed to the Google
> Groups
> >>> "Ansible Project" group.
> >>> To unsubscribe from this group and stop receiving emails from it, send
> an
> >>> email to [email protected].
> >>> To post to this group, send email to [email protected].
> >>> To view this discussion on the web visit
> >>>
> https://groups.google.com/d/msgid/ansible-project/cbe7f116-aadf-4d57-94ea-712834cb498a%40googlegroups.com.
>
>
> >>> For more options, visit https://groups.google.com/d/optout.
> >>
> >>
> > --
> > You received this message because you are subscribed to the Google
> Groups
> > "Ansible Project" group.
> > To unsubscribe from this group and stop receiving emails from it, send
> an
> > email to [email protected] <javascript:>.
> > To post to this group, send email to [email protected]
> <javascript:>.
> > To view this discussion on the web visit
> >
> https://groups.google.com/d/msgid/ansible-project/a6197878-8366-4137-8af2-4cbc3b368e07%40googlegroups.com.
>
>
> >
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Brian Coca
>
--
You received this message because you are subscribed to the Google Groups
"Ansible Project" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/ansible-project/ba8f6a53-67cb-4949-aa2b-029265504770%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.