Github user nickwallen commented on the issue:

    https://github.com/apache/incubator-metron/pull/436
  
    I have been able to launch "Quick Dev" with deployment report.  Thanks for 
the fix @dlyle65535 
    
    I have been fighting a bit with the AWS deployment.  I ran into two issues.
    
    (1)   On one pass the setup of Ambari seems to fail, but the deployment 
continued, which causes it to fail later on in the deployment.  To fix, I 
manually logged into the host and ran the Ambari setup and then re-ran the 
deployment which addressed the problem.
    
    I am almost certain that I have seen this before prior to the work in this 
PR.  
    ```
    $ ./run.sh
    ...
    
    TASK [ambari_master : Setup ambari server] 
*************************************
    ...
    
    "Successfully downloaded JDK distribution to 
/var/lib/ambari-server/resources/jdk-8u77-linux-x64.tar.gz", "Installing JDK to 
/usr/jdk64/", "Successfully installed JDK to /usr/jdk64/", "Downloading JCE 
Policy archive from 
http://public-repo-1.hortonworks.com/ARTIFACTS/jce_policy-8.zip to 
/var/lib/ambari-server/resources/jce_policy-8.zip", "", "Successfully 
downloaded JCE Policy archive to 
/var/lib/ambari-server/resources/jce_policy-8.zip", "Installing JCE policy...", 
"Completing setup...", "Configuring database...", "Enter advanced database 
configuration [y/n] (n)? ", "Configuring database...", "Default properties 
detected. Using built-in database.", "Configuring ambari database...", 
"Checking PostgreSQL...", "Running initdb: This may take up to a minute.", 
"Initializing database: [  OK  ]", "", "About to start PostgreSQL", 
"Configuring local database...", "Connecting to local database...connection 
timed out...retrying (1)", "Connecting to local database...connection timed 
out...r
 etrying (2)", "Connecting to local database...unable to connect to database", 
"ERROR: could not change directory to \"/home/centos\"", "psql: FATAL:  the 
database system is starting up", "", "ERROR: Exiting with exit code 2. ", 
"REASON: Running database init script failed. Exiting."], "warnings": []}
    
    $ ./run.sh
    ...
    
    TASK [ambari_config : check if ambari-server is up on 
ec2-52-37-229-181.us-west-2.compute.amazonaws.com:8080] ***
    fatal: [ec2-52-37-229-181.us-west-2.compute.amazonaws.com]: FAILED! => 
{"changed": false, "elapsed": 300, "failed": true, "msg": "Timeout when waiting 
for ec2-52-37-229-181.us-west-2.compute.amazonaws.com:8080"}
    ```
    
    (2) The second issue was more unexpected.  On all but one of the 10 AWS 
nodes, the deployment went smoothly.  At some point during the deployment, 
Ansible could not talk to one node, but it continued on anyways.  After the 9 
were finished, Ambari showed all 10 nodes, except the one, which it showed in 
yellow indicating that it could not get a heartbeat.
    
    After Ansible was done with the 9 nodes, it then seemed to almost start 
over on the last node.  It went and rebuilt the source code, pushed out the 
RPMs, reinstalled the MPack, etc.  That really confused the cluster and it has 
not processed any data.  
    
    I'm sure a little manual effort could fix-up the cluster, but the behavior 
of Ansible was weird.  Before when I've worked with the AWS deployment, it 
would fail if any one node failed.  Now it seems to retry failed nodes at a 
later point in time, which has some negative implications when we expect 
actions like the build, mpack install, etc to only occur once.
    
    Not sure what to make of this issue.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to