ctubbsii commented on a change in pull request #2238:
URL: https://github.com/apache/accumulo/pull/2238#discussion_r693302409



##########
File path: pom.xml
##########
@@ -566,6 +566,11 @@
         <artifactId>slf4j-api</artifactId>
         <version>${slf4j.version}</version>
       </dependency>
+      <dependency>
+        <groupId>org.yaml</groupId>
+        <artifactId>snakeyaml</artifactId>
+        <version>1.29</version>
+      </dependency>

Review comment:
       This looks too good to be true... please correct me if I'm wrong, but at 
first glance, this appears to be a pure Java yaml implementation licensed as 
Apache 2.0 without any new transitive dependencies to conflict with.

##########
File path: 
core/src/test/java/org/apache/accumulo/core/conf/cluster/ClusterConfigParserTest.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.conf.cluster;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+import java.io.File;
+import java.net.URL;
+import java.util.Map;
+
+import org.junit.Test;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+
+@SuppressFBWarnings(value = "PATH_TRAVERSAL_IN", justification = "paths 
provided by test")
+public class ClusterConfigParserTest {
+
+  @Test
+  public void testParse() throws Exception {
+    URL configFile = ClusterConfigParserTest.class.getResource("/cluster.yml");

Review comment:
       See other comment about top-level resources that are likely to collide.

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -147,86 +172,141 @@ function start_all() {
     start_tservers
   fi
 
-  grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | while read -r host; do
-    start_service "$host" manager
+  IFS_ORIG=$IFS
+  IFS=","
+
+  for manager in ${HOSTS["MANAGERS"]}; do
+    start_service "$manager" manager
+  done
+
+  for gc in "${HOSTS["GC"]}"; do
+    start_service "$gc" gc
+  done
+
+  for monitor in ${HOSTS["MONITORS"]}; do
+    start_service "$monitor" monitor
   done
 
-  grep -Ev '(^#|^\s*$)' "${conf}/gc" | while read -r host; do
-    start_service "$host" gc
+  for tracer in ${HOSTS["TRACER"]}; do
+    start_service "$tracer" tracer
   done
 
-  grep -Ev '(^#|^\s*$)' "${conf}/tracers" | while read -r host; do
-    start_service "$host" tracer
+  for coordinator in ${HOSTS["COMPACTION_COORDINATORS"]}; do
+    start_service "$coordinator" compaction-coordinator
   done
 
-  start_service "$monitor" monitor
+  for queue in ${HOSTS["COMPACTION_QUEUES"]}; do
+    for compactor in ${HOSTS["COMPACTORS_$queue"]}; do
+      start_service "$compactor" compactor "-q $queue"

Review comment:
       These are two arguments, not one, so should be quoted as two:
   
   ```suggestion
         start_service "$compactor" compactor -q "$queue"
   ```

##########
File path: server/base/src/main/java/org/apache/accumulo/server/util/ZooZap.java
##########
@@ -61,6 +61,10 @@ private static void message(String msg, Opts opts) {
     boolean zapTservers = false;
     @Parameter(names = "-tracers", description = "remove tracer locks")
     boolean zapTracers = false;
+    @Parameter(names = "-coordinators", description = "remove compaction 
coordinator locks")

Review comment:
       I wish `compaction-coordinators` wasn't so lengthy. `-coordinators` is 
just so generic, when we're referring to a specific coordinator.

##########
File path: core/src/test/resources/cluster.yml
##########
@@ -0,0 +1,53 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+managers:
+  - localhost1
+  - localhost2
+
+monitors:
+  - localhost1
+  - localhost2
+
+tracer:
+  - localhost
+
+gc:

Review comment:
       This can also be plural, since we can have standby accumulo-gc services, 
just like the manager or monitor. If you make `gc` plural, it looks like `gcs`, 
which reads as "jee - see - ess" rather than the intended "jee - sees". To 
normalize, could just make them all singular. In English, instead of reading 
the `managers:` section as "list of `managers`", we read the `manager:` section 
as "list of hosts with `manager` role".

##########
File path: assemble/bin/accumulo-service
##########
@@ -77,7 +79,7 @@ function start_service() {
   rotate_log "$outfile"
   rotate_log "$errfile"
 
-  nohup "${bin}/accumulo" "$service" >"$outfile" 2>"$errfile" < /dev/null &
+  nohup "${bin}/accumulo" "$service" "${@:2}" >"$outfile" 2>"$errfile" < 
/dev/null &

Review comment:
       Hmm, was this related to something needed in this PR, or just a nice add 
you found? I'm okay either way, just curious what motivated this.

##########
File path: 
core/src/main/java/org/apache/accumulo/core/conf/cluster/ClusterConfigParser.java
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.conf.cluster;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.nio.file.StandardOpenOption;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.yaml.snakeyaml.Yaml;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+
+public class ClusterConfigParser {

Review comment:
       This class looks about right. I'm not a big fan of when libraries force 
you to work with `Object` types as generic params, but it looks like you don't 
have a lot of choice there.

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)
+
+  declare -A HOSTS
+
+  HOSTS["MANAGERS"]=$(grep managers $config | awk -F':' '{ print $2 }')
+  HOSTS["MONITORS"]=$(grep monitors $config | awk -F':' '{ print $2 }')
+  HOSTS["GC"]=$(grep gc $config | awk -F':' '{ print $2 }')
+  HOSTS["TRACER"]=$(grep tracer $config | awk -F':' '{ print $2 }')
+  HOSTS["TSERVERS"]=$(grep tservers $config | awk -F':' '{ print $2 }')
+
+  grep "compaction" $config $2>1 > /dev/null
+  compaction_in_config=$?
+
+  if [[ $compaction_in_config -eq 0 ]]; then
+    HOSTS["COMPACTION_COORDINATORS"]=$(grep "compaction.coordinators" $file | 
awk -F':' '{ print $2 }')
+    HOSTS["COMPACTION_QUEUES"]=$(grep "compaction.compactors.queues" $file | 
awk -F':' '{ print $2 }')

Review comment:
       Probably better to use `cut -f2- -d:` rather than the heavyweight `awk 
-F':' '{ print $2 }'`

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -340,57 +458,74 @@ function main() {
   accumulo_cmd="${bin}/accumulo"
   SSH='ssh -qnf -o ConnectTimeout=2'
 
-  manager_file="managers"
-  if [[ ! -f "$conf/$manager_file" && -f "$conf/masters" ]]; then
-    echo "WARN : Use of 'masters' file is deprecated; use 'managers' file 
instead."
-    manager_file="masters"
-  fi
-
   case "$1" in
     create-config)
-      echo "localhost" > "$conf/gc"
-      echo "localhost" > "$conf/managers"
-      echo "localhost" > "$conf/monitor"
-      echo "localhost" > "$conf/tracers"
-      echo "localhost" > "$conf/tservers"
+      cat <<EOF > $conf/cluster.yml
+managers:
+  - localhost
+
+monitors:
+  - localhost
+
+tracer:
+  - localhost
+
+gc:
+  - localhost
+
+tservers:
+  - localhost
+
+#compaction:
+#  coordinators:
+#    - localhost
+#  compactors:
+#    - queues:
+#      - q1
+#      - q2
+#    - q1:
+#        - localhost
+#    - q2:
+#        - localhost
+      EOF
       ;;
     restart)
-      verify_config

Review comment:
       Nice how we already had a method that could be swapped out to do 
something more useful.

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)
+
+  declare -A HOSTS
+
+  HOSTS["MANAGERS"]=$(grep managers $config | awk -F':' '{ print $2 }')
+  HOSTS["MONITORS"]=$(grep monitors $config | awk -F':' '{ print $2 }')
+  HOSTS["GC"]=$(grep gc $config | awk -F':' '{ print $2 }')
+  HOSTS["TRACER"]=$(grep tracer $config | awk -F':' '{ print $2 }')
+  HOSTS["TSERVERS"]=$(grep tservers $config | awk -F':' '{ print $2 }')
+
+  grep "compaction" $config $2>1 > /dev/null
+  compaction_in_config=$?
+
+  if [[ $compaction_in_config -eq 0 ]]; then
+    HOSTS["COMPACTION_COORDINATORS"]=$(grep "compaction.coordinators" $file | 
awk -F':' '{ print $2 }')
+    HOSTS["COMPACTION_QUEUES"]=$(grep "compaction.compactors.queues" $file | 
awk -F':' '{ print $2 }')
+
+    IFS_ORIG=$IFS
+    IFS=","
+    for queue in ${HOSTS["COMPACTION_QUEUES"]}; do
+      HOSTS["COMPACTORS_$queue"]=$(grep "compaction.compactors.$queue" $file | 
awk -F':' '{ print $2 }')
+    done
+    IFS=$IFS_ORIG
   fi
 
-  if [[ -z "${monitor}" ]] ; then
-    monitor=$manager1
-    if [[ -f "${conf}/monitor" ]]; then
-      monitor=$(grep -Ev '(^#|^\s*$)' "${conf}/monitor" | head -1)
-    fi
-    if [[ -z "${monitor}" ]] ; then
-      echo "Could not infer a Monitor role. You need to either define 
\"${conf}/monitor\","
-      echo "or make sure \"${conf}/$manager_file\" is non-empty."
-      exit 1
-    fi
+  if [[ ${#HOSTS["MANAGERS"]} -eq 0 ]]; then
+    echo "ERROR: managers not found in ${conf}/cluster.yml"
+    exit 1
   fi
-  if [[ ! -f "${conf}/tracers" ]]; then
-    if [[ -z "${manager1}" ]] ; then
-      echo "Could not find a manager node to use as a default for the tracer 
role."
-      echo "Either set up \"${conf}/tracers\" or make sure 
\"${conf}/$manager_file\" is non-empty."
-      exit 1
-    else
-      echo "$manager1" > "${conf}/tracers"
-    fi
+
+  if [[ ${#HOSTS["TSERVERS"]} -eq 0 ]]; then
+    echo "ERROR: tservers not found in ${conf}/cluster.yml"
+    exit 1
   fi
-  if [[ ! -f "${conf}/gc" ]]; then
-    if [[ -z "${manager1}" ]] ; then
-      echo "Could not infer a GC role. You need to either set up 
\"${conf}/gc\" or make sure \"${conf}/$manager_file\" is non-empty."
-      exit 1
-    else
-      echo "$manager1" > "${conf}/gc"
-    fi
+
+  unset manager1
+  manager1=$(echo "${HOSTS["MANAGERS"]}" | cut -d"," -f1)
+
+  if [[ ${#HOSTS["MONITORS"]} -eq 0 ]]; then
+    echo "WARN: monitors not found in ${conf}/cluster.yml, using first manager 
host $manager1"
+    HOSTS["MONITORS"]=$manager1
+    exit 1
+  fi

Review comment:
       You probably don't want to exit with an error if it's just a warning and 
you're mitigating it by assigning the first manager. Also, the Java code could 
probably do that logic, so that its output is always valid, and there's less 
checking for whether a server type is set or falls back to the default.

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)

Review comment:
       Might want to catch a bad return code from Java here, as in:
   
   ```suggestion
     config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml) 
|| call_some_die_method_here
   ```

##########
File path: 
core/src/test/java/org/apache/accumulo/core/conf/cluster/ClusterConfigParserTest.java
##########
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.conf.cluster;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertTrue;
+
+import java.io.File;
+import java.net.URL;
+import java.util.Map;
+
+import org.junit.Test;
+
+import edu.umd.cs.findbugs.annotations.SuppressFBWarnings;
+
+@SuppressFBWarnings(value = "PATH_TRAVERSAL_IN", justification = "paths 
provided by test")
+public class ClusterConfigParserTest {
+
+  @Test
+  public void testParse() throws Exception {
+    URL configFile = ClusterConfigParserTest.class.getResource("/cluster.yml");
+    assertNotNull(configFile);
+
+    Map<String,String> contents =
+        ClusterConfigParser.parseConfiguration(new 
File(configFile.toURI()).getAbsolutePath());
+    assertEquals(5, contents.size());
+    assertTrue(contents.containsKey("managers"));
+    assertEquals("localhost1,localhost2", contents.get("managers"));
+    assertTrue(contents.containsKey("monitors"));
+    assertEquals("localhost1,localhost2", contents.get("monitors"));
+    assertTrue(contents.containsKey("tracer"));
+    assertEquals("localhost", contents.get("tracer"));
+    assertTrue(contents.containsKey("gc"));
+    assertEquals("localhost", contents.get("gc"));
+    assertTrue(contents.containsKey("tservers"));
+    assertEquals("localhost1,localhost2,localhost3,localhost4", 
contents.get("tservers"));
+    assertFalse(contents.containsKey("compaction"));
+    assertFalse(contents.containsKey("compaction.coordinators"));
+    assertFalse(contents.containsKey("compaction.compactors"));
+    assertFalse(contents.containsKey("compaction.compactors.queues"));
+    assertFalse(contents.containsKey("compaction.compactors.q1"));
+    assertFalse(contents.containsKey("compaction.compactors.q2"));
+  }
+
+  @Test
+  public void testParseWithExternalCompactions() throws Exception {
+    URL configFile =
+        
ClusterConfigParserTest.class.getResource("/cluster-with-external-compactions.yml");

Review comment:
       We have a lot of top-level resources. Resources can go in package 
folders like classes. This makes their location on the class path less likely 
to collide. Granted, this file name isn't likely to collide, but I still think 
it's best practice to put resources in a package. Could put this file in 
`src/test/resources/org/apache/accumulo/core/conf/cluster`, and then I think 
you could drop the leading `/` when loading and just load it by file name (but 
I'm not 100% sure I remember rules for locating resources).

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)
+
+  declare -A HOSTS
+
+  HOSTS["MANAGERS"]=$(grep managers $config | awk -F':' '{ print $2 }')
+  HOSTS["MONITORS"]=$(grep monitors $config | awk -F':' '{ print $2 }')
+  HOSTS["GC"]=$(grep gc $config | awk -F':' '{ print $2 }')
+  HOSTS["TRACER"]=$(grep tracer $config | awk -F':' '{ print $2 }')
+  HOSTS["TSERVERS"]=$(grep tservers $config | awk -F':' '{ print $2 }')
+
+  grep "compaction" $config $2>1 > /dev/null
+  compaction_in_config=$?
+
+  if [[ $compaction_in_config -eq 0 ]]; then

Review comment:
       You don't need to check `$?` when you can put the command directly into 
the if statement. Here, `grep -q` is useful for matching without displaying the 
output:
   
   ```suggestion
     if echo "$config" | grep -q compaction; then
   ```
   

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -304,16 +417,21 @@ function stop_here() {
   if grep -Eq 'localhost|127[.]0[.]0[.]1' "${conf}/tservers"; then
     ${accumulo_cmd} admin stop localhost
   else
+    IFS_ORIG=$IFS
+    IFS=","
     for host in "${hosts_to_check[@]}"; do
-      if grep -q "$host" "${conf}/tservers"; then
-        ${accumulo_cmd} admin stop "$host"
-      fi
+      for tserver in ${HOSTS["TSERVERS"]}; do
+        if grep -q "$host" "$tserver"; then
+          ${accumulo_cmd} admin stop "$host"
+        fi
+      done
     done
+    IFS=$IFS_ORIG

Review comment:
       This looks like the correct use of IFS... but messing with IFS can have 
unexpected consequences, especially when a future dev comes in later and 
changes something without understanding the implications.
   
   Instead of having these comma-separated, just make them space separated or 
use an array to begin with, as in the outer loop. It'd be much more reliable.

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -340,57 +458,74 @@ function main() {
   accumulo_cmd="${bin}/accumulo"
   SSH='ssh -qnf -o ConnectTimeout=2'
 
-  manager_file="managers"
-  if [[ ! -f "$conf/$manager_file" && -f "$conf/masters" ]]; then
-    echo "WARN : Use of 'masters' file is deprecated; use 'managers' file 
instead."
-    manager_file="masters"
-  fi
-
   case "$1" in
     create-config)
-      echo "localhost" > "$conf/gc"
-      echo "localhost" > "$conf/managers"
-      echo "localhost" > "$conf/monitor"
-      echo "localhost" > "$conf/tracers"
-      echo "localhost" > "$conf/tservers"
+      cat <<EOF > $conf/cluster.yml
+managers:
+  - localhost
+
+monitors:
+  - localhost
+
+tracer:
+  - localhost
+
+gc:
+  - localhost
+
+tservers:
+  - localhost
+
+#compaction:
+#  coordinators:
+#    - localhost
+#  compactors:
+#    - queues:
+#      - q1
+#      - q2
+#    - q1:
+#        - localhost
+#    - q2:
+#        - localhost
+      EOF

Review comment:
       Nice. But, the EOF needs to be all the way to the left for this to work 
properly. It needs to be the only thing on the line.
   
   ```suggestion
         cat <<EOF > $conf/cluster.yml
   managers:
     - localhost
   
   monitors:
     - localhost
   
   tracer:
     - localhost
   
   gc:
     - localhost
   
   tservers:
     - localhost
   
   #compaction:
   #  coordinators:
   #    - localhost
   #  compactors:
   #    - queues:
   #      - q1
   #      - q2
   #    - q1:
   #        - localhost
   #    - q2:
   #        - localhost
   EOF
   ```

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)
+
+  declare -A HOSTS
+
+  HOSTS["MANAGERS"]=$(grep managers $config | awk -F':' '{ print $2 }')
+  HOSTS["MONITORS"]=$(grep monitors $config | awk -F':' '{ print $2 }')
+  HOSTS["GC"]=$(grep gc $config | awk -F':' '{ print $2 }')
+  HOSTS["TRACER"]=$(grep tracer $config | awk -F':' '{ print $2 }')
+  HOSTS["TSERVERS"]=$(grep tservers $config | awk -F':' '{ print $2 }')

Review comment:
       Don't rely on bash 4.0 associative arrays. They won't work on a Mac. 
It's also unnecessary, because you can just as easily do `MANAGER_HOSTS=...` or 
similar. You don't actually need all hosts to be stored in the same variable. 
I'm also a bit confused by the grepping over a variable instead of a file 
handle. I don't think that will work.
   
   If the output was something like:
   
   ```
   manager:host1,host2,host3
   gc:host4
   ```
   
   Then `$config` would look like (a space separated array):
   ```
   manager:host1,host2,host3 gc:host4
   ```
   
   then you could either loop through the array until you found an item 
starting with `manager:`, or just grep for the manager chunk at word boundaries:
   
   ```bash
   MANAGER_HOSTS=$(echo "$config" | grep '\<managers:[^ ]*' | cut -f2- -d: | tr 
',' ' ')
   ```
   
   Also be careful about using colon as a delimiter for hosts, because hosts 
could be specified as IPv6, or we may wish to add port numbers at some point. 
In my example above, it's okay, because it only removes the first colon with 
the trailing dash in the field specifier, `-f2-` (interpret as fields 2 and 
later; the delimiter was specified as a colon with `-d:`).

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -43,52 +43,74 @@ function invalid_args {
   exit 1
 }
 
-function verify_config {
+function parse_config {
   if [[ -f ${conf}/slaves ]]; then
     echo "ERROR: A 'slaves' file was found in ${conf}/"
-    echo "Accumulo now reads tablet server hosts from 'tservers' and requires 
that the 'slaves' file not be present to reduce confusion."
+    echo "Accumulo now uses cluster host configuration information from 
'cluster.yml' and requires that the 'slaves' file not be present to reduce 
confusion."
     echo "Please rename the 'slaves' file to 'tservers' or remove it if both 
exist."
     exit 1
   fi
 
-  if [[ ! -f ${conf}/tservers ]]; then
-    echo "ERROR: A 'tservers' file was not found at ${conf}/tservers"
-    echo "Please make sure it exists and is configured with tablet server 
hosts."
+  if [[ ! -f ${conf}/cluster.yml ]]; then
+    echo "ERROR: A 'cluster.yml' file was not found at ${conf}/cluster.yml"
+    echo "Please make sure it exists and is configured with the host 
information. Run 'create-config' to create an example configuration."
     exit 1
   fi
 
-  unset manager1
-  if [[ -f "${conf}/$manager_file" ]]; then
-    manager1=$(grep -Ev '(^#|^\s*$)' "${conf}/$manager_file" | head -1)
+  config=$(${accumulo_cmd} 
org.apache.accumulo.core.conf.cluster.ClusterConfigParser ${conf}/cluster.yml)
+
+  declare -A HOSTS
+
+  HOSTS["MANAGERS"]=$(grep managers $config | awk -F':' '{ print $2 }')
+  HOSTS["MONITORS"]=$(grep monitors $config | awk -F':' '{ print $2 }')
+  HOSTS["GC"]=$(grep gc $config | awk -F':' '{ print $2 }')
+  HOSTS["TRACER"]=$(grep tracer $config | awk -F':' '{ print $2 }')
+  HOSTS["TSERVERS"]=$(grep tservers $config | awk -F':' '{ print $2 }')
+
+  grep "compaction" $config $2>1 > /dev/null
+  compaction_in_config=$?
+
+  if [[ $compaction_in_config -eq 0 ]]; then
+    HOSTS["COMPACTION_COORDINATORS"]=$(grep "compaction.coordinators" $file | 
awk -F':' '{ print $2 }')
+    HOSTS["COMPACTION_QUEUES"]=$(grep "compaction.compactors.queues" $file | 
awk -F':' '{ print $2 }')
+
+    IFS_ORIG=$IFS
+    IFS=","
+    for queue in ${HOSTS["COMPACTION_QUEUES"]}; do
+      HOSTS["COMPACTORS_$queue"]=$(grep "compaction.compactors.$queue" $file | 
awk -F':' '{ print $2 }')

Review comment:
       could do `eval COMPACTORS_$queue=...` instead of an associative array 
for these. Or, put the queue name at the front, in a regular bash array, as in:
   
   ```bash
   COMPACTOR_QUEUES=($queue:'compactors,here')
   ```
   You can use += on bash 3.2 on Macs, I've read. But, another way to append to 
an array is:
   ```bash
   COMPACTOR_QUEUES=("${COMPACTOR_QUEUES[@]}" $queue:'compactors,here')
   ```

##########
File path: assemble/bin/accumulo-cluster
##########
@@ -116,19 +138,21 @@ function control_service() {
     if [[ $host == localhost || $host == "$(hostname -s)" || $host == 
"$(hostname -f)" || $host == $(get_ip) ]] ; then
       ACCUMULO_SERVICE_INSTANCE="${ACCUMULO_SERVICE_INSTANCE}" 
"${bin}/accumulo-service" "$service" "$control_cmd"
     else
-      $SSH "$host" "bash -c 
'ACCUMULO_SERVICE_INSTANCE=${ACCUMULO_SERVICE_INSTANCE} ${bin}/accumulo-service 
\"$service\" \"$control_cmd\"'"
+      $SSH "$host" "bash -c 
'ACCUMULO_SERVICE_INSTANCE=${ACCUMULO_SERVICE_INSTANCE} ${bin}/accumulo-service 
\"$service\" \"$control_cmd\" \"{$@:4}'"

Review comment:
       This looks incorrect.  There's no trailing escaped double-quote. Also, 
the quoting is extremely hard to reason about here... not sure it's a great 
idea to pass along additional arbitrary arguments to bash within ssh within 
bash when they could have their own quoting issues.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to