shannawaz commented on a change in pull request #304: Add optional support for Azure ADLS Gen2 URL: https://github.com/apache/fluo-muchos/pull/304#discussion_r363528680
########## File path: ansible/roles/azure/tasks/create_adlsgen2.yml ########## @@ -0,0 +1,235 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# +# These Ansible tasks only run on the client machine where Muchos runs +# At a high level, the various sections in this file do the following: +# 1. Create an Azure ADLS Gen2 storage account. +# 2. Create User Assigned Identity. +# 3. Assign roles to storage accounts. +# 4. Create filesysystem/container in storage accounts. +# 5. Update tenant_id, client_id and instance_volumes_preferred in muchos.props. +# 6. Assign User Assigned Identity to VMSS. + +- name: Generate MD5 checksum based on resource_group name, vmss_name and cluster name + shell: echo -n {{ resource_group + vmss_name + location }}|md5sum|tr -cd "[:alnum:]"|cut -c 1-16|tr '[:upper:]' '[:lower:]' + register: StorageAccountMD5 + +- name: Generate random names for storage account names + set_fact: + StorageAccountName: "{{ StorageAccountMD5.stdout + 99|random(seed=resource_group)|string + 99|random(seed=vmss_name)|string + 9|random(seed=location)|string }}" + +- name: Initialize instance variables + set_fact: + InstanceVolumesAuto: [] + InstanceVolumesManual: [] + +- name: Validate instance_volumes_input + fail: msg="Variable instance_volumes_input incorrectly specified, Both Manual and Auto cannot be specified at same time" + when: instance_volumes_input.split('|')[0].split(',') != [''] and instance_volumes_input.split('|')[1].split(',') != [''] + +- name: Assign manual or autogenerated volumes + set_fact: + InstanceVolumesTemp: "{{ instance_volumes_input.split('|')[0].split(',')|list if instance_volumes_input.split('|')[0].split(',') != [''] else instance_volumes_input.split('|')[1].split(',')|list }}" + +- name: Retrieve sequence end number to get the number of storage accounts + set_fact: + InstanceVolumesEndSequence: "{{ '1' if instance_volumes_input.split('|')[0].split(',') == [''] else InstanceVolumesTemp[0]|int }}" + +- name: Generate names for Storage Accounts + set_fact: + InstanceVolumesAuto: "{{ InstanceVolumesAuto + ['abfss://'+'accumulodata'+'@'+StorageAccountName+item+'.'+InstanceVolumesTemp[1]+'/accumulo'] }}" + with_sequence: start=1 end={{ InstanceVolumesEndSequence|int }} + when: InstanceVolumesTemp[0]|int != 0 + +- name: Retrieve ABFSS values when specified manually + set_fact: + InstanceVolumesManual: "{{ InstanceVolumesManual + [ item ] }}" + loop: + "{{ InstanceVolumesTemp }}" + when: item.split('://')[0] == 'abfss' and instance_volumes_input.split('|')[0].split(',') == [''] + +# This is final list of instance volumes +- name: Assign variables for autogeneration or manual for storage account creation + set_fact: + InstanceVolumes: "{{ InstanceVolumesManual if instance_volumes_input.split('|')[0].split(',') == [''] else InstanceVolumesAuto }}" + +- name: Update instance_volumes_preferred in muchos.props + lineinfile: + path: "{{ deploy_path }}/conf/muchos.props" + regexp: '^instance_volumes_preferred\s*=\s*|^[#]instance_volumes_preferred\s*=\s*' + line: "instance_volumes_preferred = {{ InstanceVolumes|join(',') }}" + +# Not registering variable because storage values are not visible immediately +- name: Create ADLS Gen2 storage acount using REST API + azure_rm_resource: + resource_group: "{{ resource_group }}" + provider: Storage + resource_type: storageAccounts + resource_name: "{{ item.split('@')[1].split('.')[0] }}" + api_version: '2019-04-01' + idempotency: yes + state: present + body: + sku: + name: "{{ adls_storage_type }}" + kind: StorageV2 + properties: + isHnsEnabled: yes + location: "{{ location }}" + loop: + "{{ InstanceVolumes }}" + +# Creating User Assigned identity with vmss_name suffixed by ua-msi if not specified in muchos.props +# Not registering variable because user identity values are not visible immediately +- name: Create User Assigned Identity + azure_rm_resource: + resource_group: "{{ resource_group }}" + provider: ManagedIdentity + resource_type: userAssignedIdentities + resource_name: "{{ user_assigned_identity if user_assigned_identity !='' else vmss_name + '-ua-msi' }}" + api_version: '2018-11-30' + idempotency: yes + state: present + body: + location: "{{ location }}" + +# Retrieving facts about User Assigned Identity +- name: Get facts for User Assigned Identity + azure_rm_resource_facts: + resource_group: "{{ resource_group }}" + provider: ManagedIdentity + resource_type: userAssignedIdentities + resource_name: "{{ user_assigned_identity if user_assigned_identity !='' else vmss_name + '-ua-msi' }}" + api_version: '2018-11-30' Review comment: Data Lake Storage Gen2 uses an access control model that supports both role-based access control (RBAC) and POSIX-like access control lists (ACLs). Access to data in Data Lake Storage Gen2 is controlled through managed identities. A managed identity is an identity registered in Azure Active Directory (Azure AD) whose credentials are managed by Azure. With managed identities, you don't need to register service principals in Azure AD or maintain credentials such as certificates. Azure services have two types of managed identities: system-assigned and user-assigned. Further information [here](https://docs.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview) and [here](https://hadoop.apache.org/docs/current/hadoop-azure/abfs.html#Authentication). ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
