[
https://issues.apache.org/jira/browse/HAWQ-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148317#comment-15148317
]
Clay B. commented on HAWQ-307:
------------------------------
h1. Good News:
I finally got HAWQ running all the way to taking inserts on my Ubuntu test VM.
A lot of non-obvious issues were hit on the way and as to configuring the
machine.
{code}
hawq@hawq-ubuntu-1404:/tmp/kitchen/cache/hawq$ hawq state
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:--HAWQ
instance status summary
20160216:08:35:12:009157
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Master
instance = Active
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- No
Standby master defined
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
segment instance count from config file = 1
20160216:08:35:12:009157
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Segment
Status
20160216:08:35:12:009157
hawq_state:hawq-ubuntu-1404:hawq-[INFO]:------------------------------------------------------
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
segments count from catalog = 1
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
segment valid (at master) = 1
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
segment failures (at master) = 0
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
number of postmaster.pid files missing = 0
20160216:08:35:12:009157 hawq_state:hawq-ubuntu-1404:hawq-[INFO]:-- Total
number of postmaster.pid files found = 1
hawq@hawq-ubuntu-1404:/tmp/kitchen/cache/hawq$ psql -d postgres
psql (8.2.15)
Type "help" for help.
postgres=# create table t ( i int );
CREATE TABLE
postgres=# insert into t values(1);
INSERT 0 1
postgres=# select * from t;
i
---
1
(1 row)
postgres=# \q
could not save history to file "/var/lib/hawq/.psql_history": No such file or
directory
{code}
h1. More Issues
These have all been fixed in my pull request or my Chef code.
h2. User Owning HAWQ and Running {{hawq init cluster}}
I was trying to run {{hawq init cluster}} as a largely unprivileged user but
hit:
{code}
20160209:19:27:50:024272 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-No standby
host configured, skip it
Traceback (most recent call last):
File "/usr/local/hawq/bin/hawq_ctl", line 1098, in <module>
hawq_init(opts, hawq_dict)
File "/usr/local/hawq/bin/hawq_ctl", line 869, in hawq_init
instance = HawqInit(opts, hawq_dict)
File "/usr/local/hawq/bin/hawq_ctl", line 72, in __init__
self._write_config()
File "/usr/local/hawq/bin/hawq_ctl", line 127, in _write_config
with open(configFile, 'w') as f:
IOError: [Errno 13] Permission denied: '/usr/local/hawq/etc/_mgmt_config'
{code}
As such I now make a {{hawq}} user who owns all of {{/usr/local/hawq}}.
If you want to throw caution to the wind you hit this as
{{root}}:{code}root@hawq-ubuntu-1404:/usr/local/hawq# hawq init cluster
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[INFO]:-Prepare to do
'hawq init'
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[INFO]:-You can check
log in /home/vagrant/hawqAdminLogs/hawq_init_20160209.log
20160209:19:22:31:024209 hawq_init:hawq-ubuntu-1404:root-[ERROR]:-'root' user
is not allowed{code}
h2. Need to Run An SSH Server on localhost
This was a very strange thing to see and something which we do not necessarily
do on my production Hadoop clusters, so need to understand what all {{hawq init
cluster}} is doing here -- but it ssh'es a lot! (Need a passwordless ssh-key or
to like typing passwords...)
{code}
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-Prepare to
do 'hawq init'
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-You can
check log in /home/vagrant/hawqAdminLogs/hawq_init_20160209.log
ssh: connect to host localhost port 22: Connection refused
ssh: connect to host localhost port 22: Connection refused
20160209:11:14:05:023033 hawq_init:hawq-ubuntu-1404:vagrant-[INFO]:-Init hawq
with args: ['init', 'cluster']
{code}
h2. More Shell-isms
I hit issues {{source}} ing files and a few more {{#\!/bin/sh}} isms. I set all
to either use the {{.}} operator instead of {{source}} and to use
{{#!/bin/bash}} for the interpreter. All are fixed in the pull request but a
quick listing I have handy in case I missed something:
Under {{/usr/local/hawq}}:
{code}
./bin/gpload:#!/bin/sh
./bin/ipcclean:#!/bin/sh
./lib/postgresql/pgxs/config/install-sh:#!/bin/sh
./bin/lib/pysync_remote.py: os.system("/bin/sh -c 'type python>&2'");
./bin/lib/gp_bash_functions.sh: if [ $SHELL != /bin/bash ] && [ `ls -al
/bin/sh|grep -c bash` -ne 1 ];then
./bin/lib/pysync.py: os.system("/bin/sh -c 'type python>&2'");
./bin/lib/hawq_bash_functions.sh:#if [ $SHELL != /bin/bash ] && [ `ls -al
/bin/sh|grep -c bash` -ne 1 ];then
./bin/lib/gpsys.py: f = os.popen("/bin/showrev -p", "r")
./lib/postgresql/pgxs/config/mkinstalldirs:#! /bin/sh
{code}
One note, odd to see {{showrev}} unless there's Solaris support lingering.
{code}
/usr/local/bin/hawq: source_hawq_env = "source %s/greenplum_path.sh" %
hawq_home
/usr/local/hawq/bin/hawq_ctl: source_hawq_env = "source
%s/greenplum_path.sh" % opts.GPHOME
/usr/local/hawq/bin/gpload.py: cmd = 'source %s ; exec ' %
srcfile
/usr/local/hawq/sbin/gpcheck_hostdump: p = subprocess.Popen("source %s; echo
$JAVA_HEAP_MAX" % hadoop_config_file, shell = True,
/usr/local/hawq/sbin/gpcheck_hostdump: p = subprocess.Popen("source %s &&
echo $HADOOP_NAMENODE_OPTS | tr ' ' '\\n' | grep Xmx | tail -n 1" %
hadoop_env_file, shell = True,
/usr/local/hawq/sbin/gpcheck_hostdump: p = subprocess.Popen("source %s &&
echo $HADOOP_DATANODE_OPTS | tr ' ' '\\n' | grep Xmx | tail -n 1" %
hadoop_env_file, shell = True,
{code}
h2. Python Runtime Dependencies
Not mentioned anywhere but the following Python modules are needed:
* Paramiko
* Pygresql
* [PSI (Python System Information)|https://pypi.python.org/pypi/PSI/0.3b2] --
however as this hasn't been touched since 2009?! I has to swap it out with
[PSUtil|https://pypi.python.org/pypi/psutil] as PSI wouldn't build for me
anymore
* Figleaf (I could not get this to build but it seems only used by {{hawq
checkperf}}?)
Perhaps a
[{{requirements.txt}}|https://pip.pypa.io/en/stable/user_guide/#requirements-files]
could be added to the build to ensure all necessary Python dependencies are
included. (For [~cos], can anything easily be done for BIGTOP-2321 to direct
the packages to depend on the right system packages from the HAWQ side?)
h2. Operational Issues
A few testing issues were hit, but fixed in my Chef code for running HAWQ in a
new [operate_hawq
recipe|https://github.com/cbaenziger/incubator-hawq/blob/test_kitchen/chef/hawq_build/recipes/operate_hawq.rb].
The biggest issues were:
* The HDFS property {{dfs.default.replica}} needs to be set if running less
than three datanodes. This is specified as three in
{{/usr/local/hawq/etc/hdfs-client.xml}} but even if using my own
{{/etc/hadoop/conf/hdfs-site.xml}} to configure libhdfs3 and with
{{dfs.replication}} set to one I was hitting:{code}20160216:07:55:12:005998
hawq_init:hawq-ubuntu-1404:hawq-[WARNING]:-2016-02-16 07:55:11.691794, p6048,
th139685724080256, WARNING the number of nodes in pipeline is 1
[hawq-ubuntu-1404(127.0.0.1)], is less than the expected number of replica 3
for block [block pool ID: BP-2147225235-127.0.1.1-1455495404497 block ID
1073741825_1001] file /hawq_default/testFile{code}
* The {{gpcheckhdfs}} command seems to block indefinitely if the user running
{{hawq init cluster}} does not have an HDFS home directory. Once I created
{{hdfs:///user/hawq}}, then things ran along swimmingly.
* If one has a misconfigured {{LD_LIBRARY_PATH}} then they will get one of the
two following errors from {{hawq init cluster}}:
** If manually setting {{LD_LIBRARY_PATH}} on the command
line:{code}20160209:21:10:22:026772
hawq_start:hawq-ubuntu-1404:hawq-[INFO]:-fgets failure: No such file or
directory
The program "postgres" is needed by pg_ctl but was not found in the
same directory as "/usr/local/hawq/bin/pg_ctl".
Check your installation.{code} This is compounded by the frustration that
/usr/local/hawq/bin/postgres exists and runs fine; it is thanks to some
craziness using
[find_other_exec()|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/bin/initdb/initdb.c#L2999-L3000],
which throws away [standard
error|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/port/exec.c#L395]
and uses
[popen(3)|https://github.com/apache/incubator-hawq/blob/96779dd2ecd6a215da8789079116915caa99d408/src/port/exec.c#L425-L432]
to check that the {{postgres}} binary version matches that of the cluster.
** If setting an {{LD_LIBRARY_PATH}} (without in this case
{{libyarn.so.1}}):{code}20160216:07:55:11:005998
hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-No standby host configured, skip it
20160216:07:55:11:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Check if hdfs
path is available
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[WARNING]:-2016-02-16
07:55:11.691794, p6048, th139685724080256, WARNING the number of nodes in
pipeline is 1 [hawq-ubuntu-1404(127.0.0.1)], is less than the expected number
of replica 3 for block [block pool ID: BP-2147225235-127.0.1.1-1455495404497
block ID 1073741825_1001] file /hawq_default/testFile
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-1 segment
hosts defined
20160216:07:55:12:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Set
default_segment_num as: 8
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Start to init
master node: 'localhost'
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[INFO]:-Master
postgres initdb failed
20160216:07:55:13:005998 hawq_init:hawq-ubuntu-1404:hawq-[ERROR]:-Master init
failed, exit
{code} This can be checked by running the {{postgres}} command and seeing the
error.
h2. Next Steps
I need to kick off my Chef code to stand-up a HAWQ cluster on CentOS 7 (I'll
need to manually stand-up HDFS as my Chef Hadoop code doesn't today work on
CentOS) and run a fresh Ubuntu build. Then, I'll squash commits on my HAWQ-307
pull request to have just one commit message and ideally someone could also
verify my work?
> Ubuntu Support
> --------------
>
> Key: HAWQ-307
> URL: https://issues.apache.org/jira/browse/HAWQ-307
> Project: Apache HAWQ
> Issue Type: New Feature
> Components: Build
> Reporter: Lei Chang
> Assignee: Clay B.
> Fix For: 2.1.0
>
>
> To support HAWQ running on Ubuntu OS 14.04.3
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)