The segment actually has this error: 2016-12-14 13:47:34.760839 UTC,"gpadmin","gpadmin",p737499,th542214432,"172.21.13.196","40327",2016-12-14 13:47:34 UTC,0,con23798,,seg-10000,,,,,"FATAL","42704","unrecognized configuration parameter ""ONETARY""",,,,,,,0,,"guc.c",10006,
This made me check out the configs because the error was raised from guc.c. I found that /usr/local/hawq/etc/_mgmt_config has "onetary" text: hawq_lc_monetary=en_US.utf8 I'm using nodes with 24 drives so I have 24 temp directories set for both the master and the segments. Thinking that possibly the problem is related to the number of vSegs * the size of the config file exceeding a variable size, I decided to reduce the temp directories down to just two for both the master and segments. After restarting HAWQ, I could get the query to use 16 and even 24 vSegs without a problem. So maybe the guc logic needs revisiting to make sure there isn't an issue with parsing the GUCs when there are many vSegs and many temp directories. I would think this would be a problem even with the default number of vSegs per host on really large clusters. Jon Roberts Principal Engineer | [email protected] | 615-426-8661 On Mon, Dec 19, 2016 at 4:44 AM, Yi Jin <[email protected]> wrote: > Hi Jon, > > I think there is no ONTARY configuration item, so maybe I need the full log > containing that error to check the error routine. > > Best, > Yi > > On Mon, Dec 19, 2016 at 5:39 PM, Paul Guo <[email protected]> wrote: > > > You could grep the log to see whether there is "ONETARY" setting before > > this error occurs, and also grep configuration files and related test > files > > to find who tried to set this. > > > > 2016-12-14 22:36 GMT+08:00 Jon Roberts <[email protected]>: > > > > > I'm getting the error message: unrecognized configuration parameter > > > "ONETARY" with a few of the TPC-DS queries where I'm increasing the > > number > > > of vsegs to get better performance. The error message alone is > confusing > > > so even if I am doing something wrong, the message should be improved. > > > > > > My environment: > > > > > > AWS d2.8xlarge nodes > > > - 24 2 TB disks > > > - 252 GB RAM > > > - 36 cores > > > - Centos 6 > > > - 10 nodes > > > - 1 admin node > > > - 10GB network > > > - 7 TB of data > > > - Standard TPC-DS Queries > > > - hawq_rm_memory_limit_perseg = 200gb > > > - hawq_rm_stmt_vseg_memory = 16gb > > > - Random distribution on all tables > > > > > > I'm tried reducing the statement memory but that doesn't change > anything. > > > > > > Query 88 is a good example of this because it fails quickly. > > > > > > set hawq_rm_nvseg_perquery_perseg_limit=12; > > > > > > time psql -f 188.tpcds.88.sql > > > SET > > > Timing is on. > > > SET > > > Time: 0.157 ms > > > h8_30_to_9 | h9_to_9_30 | h9_30_to_10 | h10_to_10_30 | h10_30_to_11 | > > > h11_to_11_30 | h11_30_to_12 | h12_to_12_30 > > > ------------+------------+-------------+--------------+----- > > > ---------+--------------+--------------+-------------- > > > 16279055 | 32496701 | 32493080 | 48732586 | 48782652 | > > > 28460584 | 28453299 | 32518016 > > > (1 row) > > > > > > Time: 259695.969 ms > > > > > > real 4m19.706s > > > user 0m0.001s > > > sys 0m0.003s > > > > > > Next: > > > set hawq_rm_nvseg_perquery_perseg_limit=14; > > > > > > time psql -f 188.tpcds.88.sql > > > SET > > > Timing is on. > > > SET > > > Time: 0.171 ms > > > psql:188.tpcds.88.sql:95: ERROR: Error dispatching to seg25 > > > ip-172-21-13-189.ec2.internal:40000: connection pointer is NULL > > > DETAIL: Master unable to connect to seg25 > > > ip-172-21-13-189.ec2.internal:40000: FATAL: unrecognized > configuration > > > parameter "ONETARY" > > > > > > real 0m8.787s > > > user 0m0.003s > > > sys 0m0.002s > > > > > > > > > Jon Roberts > > > > > >
