[jira] [Commented] (HAWQ-307) Ubuntu Support

Clay B. (JIRA) Tue, 16 Feb 2016 01:27:05 -0800

    [ 
https://issues.apache.org/jira/browse/HAWQ-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148317#comment-15148317
 ]


Clay B. commented on HAWQ-307:
------------------------------

h1. Good News:

I finally got HAWQ running all the way to taking inserts on my Ubuntu test VM. 
A lot of non-obvious issues were hit on the way and as to configuring the 
machine.

{code}

hawq@hawq-ubuntu-1404:/tmp/kitchen/cache/hawq$ hawq state
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--HAWQ 
instance status summary
20160216:08:35:12:009157 
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Master 
instance                                = Active
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   No 
Standby master defined                           
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
segment instance count from config file  = 1
20160216:08:35:12:009157 
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
 
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Segment 
Status                                    
20160216:08:35:12:009157 
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
 
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
segments count from catalog      = 1
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
segment valid (at master)        = 1
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
segment failures (at master)     = 0
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
number of postmaster.pid files missing   = 0
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--   Total 
number of postmaster.pid files found     = 1
hawq@hawq-ubuntu-1404:/tmp/kitchen/cache/hawq$ psql -d postgres
psql (8.2.15)
Type "help" for help.

postgres=# create table t ( i int );
CREATE TABLE
postgres=# insert into t values(1);
INSERT 0 1
postgres=# select * from t;
 i 
---
 1
(1 row)

postgres=# \q
could not save history to file "/var/lib/hawq/.psql_history": No such file or 
directory
{code}

h1. More Issues

These have all been fixed in my pull request or my Chef code.

h2. User Owning HAWQ and Running {{hawq init cluster}}

I was trying to run {{hawq init cluster}} as a largely unprivileged user but 
hit:
{code}
20160209:19:27:50:024272 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-No standby 
host configured, skip it
Traceback (most recent call last):
  File "/usr/local/hawq/bin/hawq_ctl", line 1098, in <module>
    hawq_init(opts, hawq_dict)
  File "/usr/local/hawq/bin/hawq_ctl", line 869, in hawq_init
    instance = HawqInit(opts, hawq_dict)
  File "/usr/local/hawq/bin/hawq_ctl", line 72, in __init__
    self._write_config()
  File "/usr/local/hawq/bin/hawq_ctl", line 127, in _write_config
    with open(configFile, 'w') as f: 
IOError: [Errno 13] Permission denied: '/usr/local/hawq/etc/_mgmt_config'
{code}
As such I now make a {{hawq}} user who owns all of {{/usr/local/hawq}}.

If you want to throw caution to the wind you hit this as 
{{root}}:{code}root@hawq-ubuntu-1404:/usr/local/hawq# hawq init cluster
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[INFO]:-Prepare to do 
'hawq init'
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[INFO]:-You can check 
log in /home/vagrant/hawqAdminLogs/hawq_init_20160209.log
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[ERROR]:-'root' user 
is not allowed{code}

h2. Need to Run An SSH Server on localhost
This was a very strange thing to see and something which we do not necessarily 
do on my production Hadoop clusters, so need to understand what all {{hawq init 
cluster}} is doing here -- but it ssh'es a lot! (Need a passwordless ssh-key or 
to like typing passwords...)

{code}
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-Prepare to 
do 'hawq init'
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-You can 
check log in /home/vagrant/hawqAdminLogs/hawq_init_20160209.log
ssh: connect to host localhost port 22: Connection refused
ssh: connect to host localhost port 22: Connection refused
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-Init hawq 
with args: ['init', 'cluster']
{code}

h2. More Shell-isms
I hit issues {{source}} ing files and a few more {{#\!/bin/sh}} isms. I set all 
to either use the {{.}} operator instead of {{source}} and to use 
{{#!/bin/bash}} for the interpreter. All are fixed in the pull request but a 
quick listing I have handy in case I missed something:
Under {{/usr/local/hawq}}:
{code}
./bin/gpload:#!/bin/sh
./bin/ipcclean:#!/bin/sh
./lib/postgresql/pgxs/config/install-sh:#!/bin/sh
./bin/lib/pysync_remote.py:    os.system("/bin/sh -c 'type python>&2'");
./bin/lib/gp_bash_functions.sh: if [ $SHELL != /bin/bash ] && [ `ls -al 
/bin/sh|grep -c bash` -ne 1 ];then
./bin/lib/pysync.py:    os.system("/bin/sh -c 'type python>&2'");
./bin/lib/hawq_bash_functions.sh:#if [ $SHELL != /bin/bash ] && [ `ls -al 
/bin/sh|grep -c bash` -ne 1 ];then
./bin/lib/gpsys.py:    f = os.popen("/bin/showrev -p", "r")
./lib/postgresql/pgxs/config/mkinstalldirs:#! /bin/sh
{code}
One note, odd to see {{showrev}} unless there's Solaris support lingering.
{code}
/usr/local/bin/hawq:    source_hawq_env = "source %s/greenplum_path.sh" % 
hawq_home
/usr/local/hawq/bin/hawq_ctl:    source_hawq_env = "source 
%s/greenplum_path.sh" % opts.GPHOME 
/usr/local/hawq/bin/gpload.py:                    cmd = 'source %s ; exec ' % 
srcfile
/usr/local/hawq/sbin/gpcheck_hostdump:    p = subprocess.Popen("source %s; echo 
$JAVA_HEAP_MAX" % hadoop_config_file, shell = True,
/usr/local/hawq/sbin/gpcheck_hostdump:    p = subprocess.Popen("source %s && 
echo $HADOOP_NAMENODE_OPTS | tr ' ' '\\n' | grep Xmx | tail -n 1" % 
hadoop_env_file, shell = True,
/usr/local/hawq/sbin/gpcheck_hostdump:    p = subprocess.Popen("source %s && 
echo $HADOOP_DATANODE_OPTS | tr ' ' '\\n' | grep Xmx | tail -n 1" % 
hadoop_env_file, shell = True,
{code}

h2. Python Runtime Dependencies

Not mentioned anywhere but the following Python modules are needed:
* Paramiko
* Pygresql
* [PSI (Python System Information)|https://pypi.python.org/pypi/PSI/0.3b2] -- 
however as this hasn't been touched since 2009?! I has to swap it out with 
[PSUtil|https://pypi.python.org/pypi/psutil] as PSI wouldn't build for me 
anymore
* Figleaf (I could not get this to build but it seems only used by {{hawq 
checkperf}}?)

Perhaps a 
[{{requirements.txt}}|https://pip.pypa.io/en/stable/user_guide/#requirements-files]
 could be added to the build to ensure all necessary Python dependencies are 
included. (For [~cos], can anything easily be done for BIGTOP-2321 to direct 
the packages to depend on the right system packages from the HAWQ side?)

h2. Operational Issues
A few testing issues were hit, but fixed in my Chef code for running HAWQ in a 
new [operate_hawq 
recipe|https://github.com/cbaenziger/incubator-hawq/blob/test_kitchen/chef/hawq_build/recipes/operate_hawq.rb].
 The biggest issues were:
* The HDFS property {{dfs.default.replica}} needs to be set if running less 
than three datanodes. This is specified as three in 
{{/usr/local/hawq/etc/hdfs-client.xml}} but even if using my own 
{{/etc/hadoop/conf/hdfs-site.xml}} to configure libhdfs3 and with 
{{dfs.replication}} set to one I was hitting:{code}20160216:07:55:12:005998 
hawq_init:hawq-ubuntu-1404:hawq-[WARNING]:-2016-02-16 07:55:11.691794, p6048, 
th139685724080256, WARNING the number of nodes in pipeline is 1 
[hawq-ubuntu-1404(127.0.0.1)], is less than the expected number of replica 3 
for block [block pool ID: BP-2147225235-127.0.1.1-1455495404497 block ID 
1073741825_1001] file /hawq_default/testFile{code}
* The {{gpcheckhdfs}} command seems to block indefinitely if the user running 
{{hawq init cluster}} does not have an HDFS home directory. Once I created 
{{hdfs:///user/hawq}}, then things ran along swimmingly.
* If one has a misconfigured {{LD_LIBRARY_PATH}} then they will get one of the 
two following errors from {{hawq init cluster}}:
** If manually setting {{LD_LIBRARY_PATH}} on the command 
line:{code}20160209:21:10:22:026772 
hawq_start:hawq-ubuntu-1404:hawq-[INFO]:-fgets failure: No such file or 
directory
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/usr/local/hawq/bin/pg_ctl".
Check your installation.{code} This is compounded by the frustration that 
/usr/local/hawq/bin/postgres exists and runs fine; it is thanks to some 
craziness using 
[find_other_exec()|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/bin/initdb/initdb.c#L2999-L3000],
 which throws away [standard 
error|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/port/exec.c#L395]
 and uses 
[popen(3)|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/port/exec.c#L425-L432]
 to check that the {{postgres}} binary version matches that of the cluster.
** If setting an {{LD_LIBRARY_PATH}} (without in this case 
{{libyarn.so.1}}):{code}20160216:07:55:11:005998 
hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-No standby host configured, skip it
20160216:07:55:11:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Check if hdfs 
path is available
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[WARNING]:-2016-02-16 
07:55:11.691794, p6048, th139685724080256, WARNING the number of nodes in 
pipeline is 1 [hawq-ubuntu-1404(127.0.0.1)], is less than the expected number 
of replica 3 for block [block pool ID: BP-2147225235-127.0.1.1-1455495404497 
block ID 1073741825_1001] file /hawq_default/testFile
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-1 segment 
hosts defined
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Set 
default_segment_num as: 8
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Start to init 
master node: 'localhost'
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Master 
postgres initdb failed
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[ERROR]:-Master init 
failed, exit
{code} This can be checked by running the {{postgres}} command and seeing the 
error.

h2. Next Steps

I need to kick off my Chef code to stand-up a HAWQ cluster on CentOS 7 (I'll 
need to manually stand-up HDFS as my Chef Hadoop code doesn't today work on 
CentOS) and run a fresh Ubuntu build. Then, I'll squash commits on my HAWQ-307 
pull request to have just one commit message and ideally someone could also 
verify my work?

> Ubuntu Support
> --------------
>
>                 Key: HAWQ-307
>                 URL: https://issues.apache.org/jira/browse/HAWQ-307
>             Project: Apache HAWQ
>          Issue Type: New Feature
>          Components: Build
>            Reporter: Lei Chang
>            Assignee: Clay B.
>             Fix For: 2.1.0
>
>
> To support HAWQ running on Ubuntu OS 14.04.3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HAWQ-307) Ubuntu Support

Reply via email to