keith-turner commented on a change in pull request #270: Support Accumulo installs on Microsoft Azure URL: https://github.com/apache/fluo-muchos/pull/270#discussion_r314096718
########## File path: conf/muchos.props.example.azure ########## @@ -0,0 +1,180 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +[general] +# Cluster type (Azure, ec2, or existing) +cluster_type = azure +# Cluster user name (install command will SSH to cluster using this user) +# Leave default below if launching cluster in AWS +cluster_user = azureuser +# Cluster user group +cluster_group = %(cluster_user)s +# Cluster user home directory +user_home = /home/%(cluster_user)s +# Install directory where Hadoop, Accumulo, etc will be installed +install_dir = %(user_home)s/install +# Hostname of proxy node that Muchos will use to direct installation of cluster. Will be given +# public IP if launching in EC2. If not launching in EC2, node must have public IP that can be reached +# from your machine. Hostname can be chosen from "nodes" section below. +proxy_hostname = leader1 +# If set, a SOCKS proxy will be created on the specified port when connecting to proxy using 'muchos ssh <cluster>' +#proxy_socks_port = 38585 +# Accumulo Instance name +accumulo_instance = muchos +# Accumluo Password +accumulo_password = secret +# Software versions (set sha-256 in conf/checksums) +hadoop_version = 2.8.5 +zookeeper_version = 3.4.14 +spark_version = 2.2.2 +fluo_version = 1.2.0 +fluo_yarn_version = 1.0.0 +accumulo_version = 1.9.3 +# Specifies if software should be downloaded. If 'False', tarballs of the software above should be in conf/upload/ +download_software = True +# Install Hub (for GitHub) +install_hub = True +# Nameservice ID for NN HA - can be modified by user +nameservice_id = accucluster + +[azure] +resource_group = accumulo-rg +vnet = vnet1 +vnet_cidr = "10.0.0.0/8" +subnet = subnet1 +subnet_cidr = "10.1.0.0/16" +numnodes = 8 +vm_sku = Standard_D8s_v3 +managed_disk_type = Standard_LRS +numdisks = 3 +disk_size_gb = 128 +mount_root = /var/data +metrics_drive_root = var-data +# Optional proxy VM. If not set, the first node of the cluster will be selected as the proxy. +azure_proxy_host = +location = westus2 +# Optional Azure fileshare to mount on all nodes. +# Path and credentials must be updated to enable this. +#azure_fileshare_mount = /mnt/azure-fileshare +#azure_fileshare = //fileshare-to-mount.file.core.windows.net/path +#azure_fileshare_username = fs_username +#azure_fileshare_password = fs_password +# Optional integration with Azure Log Analytics +# Workspace ID and key must be updated to enable this. +az_omsIntegrationNeeded = False +#az_logs_id = workspace_id +#az_logs_key = workspace_key + +[existing] +# Root of data dirs +mount_root = /var/data +# Data directories on all nodes +data_dirs = /var/data1,/var/data2,/var/data3 +# Identifies drives for metrics +metrics_drive_ids = var-data1,var-data2,var-data3 + +[performance] +# Automatically tune Accumulo, Yarn, and Fluo performance setting by selecting or +# creating a performance profile. Try not to use more memory than each node has +# and leave some space for the OS. +profile=azd8s + +# Below are different performance profiles that can be selected. Each profile +# has the same properties with different values. + +[perf-small] +# Amount of JVM heap for each tserver +accumulo_tserv_mem=2G +# Amount of data cache for each tserver. Only applies when using Accumulo 1.x +accumulo_dcache_size=768M +# Amount of index cache for each tserver. Only applies when using Accumulo 1.x +accumulo_icache_size=256M +# In memory map size for each tserver. Only applies when using Accumulo 1.x +accumulo_imap_size=512M +# Amount of JVM heap for each Fluo worker +fluo_worker_mem_mb=2048 +# Determines the gap between the Yarn memory limit and the java -Xmx setting. +# For example if fluo_worker_mem_mb is set to 2048 and twill_reserve_mem_mb is +# set to 256, then for workers the java -Xmx setting will be set to 2048-256. +# If yarn is killing worker processes because they are using too much memory, +# then consider increasing this setting. +twill_reserve_mem_mb=256 +# Number of threads for each Flup worker +fluo_worker_threads=20 +# Number of worker to run per node +fluo_worker_instances_multiplier=1 +# Max amount of memory for YARN per node +yarn_nm_mem_mb=4096 + +[perf-medium] +accumulo_tserv_mem=3G +# Accumulo configs below only apply when using Accumulo 1.x +accumulo_dcache_size=1536M +accumulo_icache_size=512M +accumulo_imap_size=512M +fluo_worker_mem_mb=4096 +twill_reserve_mem_mb=512 +fluo_worker_threads=64 +fluo_worker_instances_multiplier=1 +yarn_nm_mem_mb=8192 + +[perf-large] +accumulo_tserv_mem=4G +# Accumulo configs below only apply when using Accumulo 1.x +accumulo_dcache_size=2G +accumulo_icache_size=1G +accumulo_imap_size=512M +fluo_worker_mem_mb=4096 +twill_reserve_mem_mb=512 +fluo_worker_threads=64 +fluo_worker_instances_multiplier=2 +yarn_nm_mem_mb=16384 + +[azd16s] +accumulo_tserv_mem=4G +accumulo_dcache_size=512M +accumulo_icache_size=512M +accumulo_imap_size=44G Review comment: I don't generally give too much memory because compactions are logarithmic, so there is diminishing returns from increasing the memory size. However I have not run any recent numbers on throughput with different memory sizes, so I can't give any specific recommendations. I was just curious about why these numbers were chosen. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services