[GitHub] [fluo-muchos] keith-turner commented on a change in pull request #270: Support Accumulo installs on Microsoft Azure

GitBox Wed, 14 Aug 2019 14:44:08 -0700

keith-turner commented on a change in pull request #270: Support Accumulo 
installs on Microsoft Azure
URL: https://github.com/apache/fluo-muchos/pull/270#discussion_r314096718


 ##########
 File path: conf/muchos.props.example.azure
 ##########
 @@ -0,0 +1,180 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+[general]
+# Cluster type (Azure, ec2, or existing)
+cluster_type = azure
+# Cluster user name (install command will SSH to cluster using this user)
+# Leave default below if launching cluster in AWS
+cluster_user = azureuser
+# Cluster user group
+cluster_group = %(cluster_user)s
+# Cluster user home directory
+user_home = /home/%(cluster_user)s
+# Install directory where Hadoop, Accumulo, etc will be installed
+install_dir = %(user_home)s/install
+# Hostname of proxy node that Muchos will use to direct installation of 
cluster.  Will be given
+# public IP if launching in EC2.  If not launching in EC2, node must have 
public IP that can be reached
+# from your machine. Hostname can be chosen from "nodes" section below.
+proxy_hostname = leader1
+# If set, a SOCKS proxy will be created on the specified port when connecting 
to proxy using 'muchos ssh <cluster>'
+#proxy_socks_port = 38585
+# Accumulo Instance name
+accumulo_instance = muchos
+# Accumluo Password
+accumulo_password = secret
+# Software versions (set sha-256 in conf/checksums)
+hadoop_version = 2.8.5
+zookeeper_version = 3.4.14
+spark_version = 2.2.2
+fluo_version = 1.2.0
+fluo_yarn_version = 1.0.0
+accumulo_version = 1.9.3
+# Specifies if software should be downloaded. If 'False', tarballs of the 
software above should be in conf/upload/
+download_software = True
+# Install Hub (for GitHub)
+install_hub = True
+# Nameservice ID for NN HA - can be modified by user
+nameservice_id = accucluster
+
+[azure]
+resource_group = accumulo-rg
+vnet = vnet1
+vnet_cidr = "10.0.0.0/8"
+subnet = subnet1
+subnet_cidr = "10.1.0.0/16"
+numnodes = 8
+vm_sku = Standard_D8s_v3
+managed_disk_type = Standard_LRS
+numdisks = 3
+disk_size_gb = 128
+mount_root = /var/data
+metrics_drive_root = var-data
+# Optional proxy VM. If not set, the first node of the cluster will be 
selected as the proxy.
+azure_proxy_host =
+location = westus2
+# Optional Azure fileshare to mount on all nodes.
+# Path and credentials must be updated to enable this.
+#azure_fileshare_mount = /mnt/azure-fileshare
+#azure_fileshare = //fileshare-to-mount.file.core.windows.net/path
+#azure_fileshare_username = fs_username
+#azure_fileshare_password = fs_password
+# Optional integration with Azure Log Analytics
+# Workspace ID and key must be updated to enable this.
+az_omsIntegrationNeeded = False
+#az_logs_id = workspace_id
+#az_logs_key = workspace_key
+
+[existing]
+# Root of data dirs
+mount_root = /var/data
+# Data directories on all nodes
+data_dirs = /var/data1,/var/data2,/var/data3
+# Identifies drives for metrics
+metrics_drive_ids = var-data1,var-data2,var-data3
+
+[performance]
+# Automatically tune Accumulo, Yarn, and Fluo performance setting by selecting 
or
+# creating a performance profile.  Try not to use more memory than each node 
has
+# and leave some space for the OS.
+profile=azd8s
+
+# Below are different performance profiles that can be selected.  Each profile
+# has the same properties with different values.
+
+[perf-small]
+# Amount of JVM heap for each tserver
+accumulo_tserv_mem=2G
+# Amount of data cache for each tserver. Only applies when using Accumulo 1.x
+accumulo_dcache_size=768M
+# Amount of index cache for each tserver. Only applies when using Accumulo 1.x
+accumulo_icache_size=256M
+# In memory map size for each tserver. Only applies when using Accumulo 1.x
+accumulo_imap_size=512M
+# Amount of JVM heap for each Fluo worker
+fluo_worker_mem_mb=2048
+# Determines the gap between the Yarn memory limit and the java -Xmx setting.
+# For example if fluo_worker_mem_mb is set to 2048 and twill_reserve_mem_mb is
+# set to 256, then for workers the java -Xmx setting will be set to 2048-256.
+# If yarn is killing worker processes because they are using too much memory,
+# then consider increasing this setting.
+twill_reserve_mem_mb=256
+# Number of threads for each Flup worker
+fluo_worker_threads=20
+# Number of worker to run per node
+fluo_worker_instances_multiplier=1
+# Max amount of memory for YARN per node
+yarn_nm_mem_mb=4096
+
+[perf-medium]
+accumulo_tserv_mem=3G
+# Accumulo configs below only apply when using Accumulo 1.x
+accumulo_dcache_size=1536M
+accumulo_icache_size=512M
+accumulo_imap_size=512M
+fluo_worker_mem_mb=4096
+twill_reserve_mem_mb=512
+fluo_worker_threads=64
+fluo_worker_instances_multiplier=1
+yarn_nm_mem_mb=8192
+
+[perf-large]
+accumulo_tserv_mem=4G
+# Accumulo configs below only apply when using Accumulo 1.x
+accumulo_dcache_size=2G
+accumulo_icache_size=1G
+accumulo_imap_size=512M
+fluo_worker_mem_mb=4096
+twill_reserve_mem_mb=512
+fluo_worker_threads=64
+fluo_worker_instances_multiplier=2
+yarn_nm_mem_mb=16384
+
+[azd16s]
+accumulo_tserv_mem=4G
+accumulo_dcache_size=512M
+accumulo_icache_size=512M
+accumulo_imap_size=44G
 
 Review comment:
   I don't generally give too much memory because compactions are logarithmic, 
so there is diminishing returns from increasing the memory size.  However I 
have not run any recent numbers on throughput with different memory sizes, so I 
can't give any specific recommendations.   I was just curious about why these 
numbers were chosen.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [fluo-muchos] keith-turner commented on a change in pull request #270: Support Accumulo installs on Microsoft Azure

Reply via email to