You might also try tuning this parameter when using HAWQ with YARN: hawq_rm_min_resource_perseg - Sets the minimum number of containers yarn will give out for a query (per segment host).
For example, with an 8 node configuration setting this to 4 (up from the default of 2) results in 32 containers (vs 16 by default). YARN queues and associated parameters with them will also be relevant here. -Kyle On Tue, Sep 27, 2016 at 1:16 PM Kavinder Dhaliwal <[email protected]> wrote: > Hi, > > Could you please describe a bit more about the nature of your tables in > hive. What file formats are the underlying files? What is the total size of > the files? > > Additionally, what the Java Heap size you have specified for pxf? and can > you share your DDL for creating the table. > > For more debugging information I would suggest you tail both > /var/log/pxf/catalina.out and /var/log/pxf/pxf-service.log and then > run select count(1) from hcatalog.default.t1; It will be obvious from > catalina.out if there is insufficient memory allocated to pxf. > > This will give more insight into where your problem is. Currently there is > a known issue with PXF, large Hive files (mainly compressed formats), and > an insufficient Java Heap size. After trying the above, perhaps you can try > increasing your heap size via the "-Xmx<SIZE>" jvm property and running > again. > > Thanks, > Kavinder > > On Mon, Sep 26, 2016 at 10:32 PM, 来熊 <[email protected]> wrote: > > > hi,all: > > I am using PXF and hcatalog to query hive, table t1,t2 in hive, and t1 is > > large table. and hawq in yarn mode. > > > > [hive@master ~]$ hive > > hive> select count(1) from t1; > > OK > > 680852926 > > Time taken: 0.721 seconds, Fetched: 1 row(s) > > hive> exit; > > > > when I query t1 in hawq ,it is very very slow: > > > > [gpadmin@master ~]$ > > [gpadmin@master ~]$ psql -U gpadmin -d gpadmin > > gpadmin=# set pxf_service_address to 'master:51200'; > > SET > > Time: 0.410 ms > > gpadmin=# select count(*) from hcatalog.default.t2; > > count > > ------- > > 1000 > > (1 row) > > > > Time: 910.853 ms > > > > gpadmin=# explain select count(*) from hcatalog.default.t1; > > QUERY PLAN > > > > ------------------------------------------------------------ > > ---------------------------------------- > > Aggregate (cost=0.00..431.00 rows=1 width=8) > > -> Gather Motion 1:1 (slice1; segments: 1) (cost=0.00..431.00 > rows=1 > > width=8) > > -> Aggregate (cost=0.00..431.00 rows=1 width=8) > > -> External Scan on t1 (cost=0.00..431.00 rows=1 > width=1) > > Optimizer status: PQO version 1.627 > > (5 rows) > > > > Time: 1388.073 ms > > gpadmin=# > > gpadmin=# select count(1) from hcatalog.default.t1; > > > > wait a long time,and cannot get result. > > log messages: > > > > 2016-09-27 09:46:25.816366 CST,"gpadmin","gpadmin", > > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13 > > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","ConnID 5. > > Registered in HAWQ resource manager (By OID)",,,,,,"select count(*) from > > hcatalog.default.t1;",0,,"rmcomm_QD2RM.c",609, > > 2016-09-27 09:46:25.816508 CST,,,p760393,th-1935386496,,, > > ,0,con4,,seg-10000,,,,,"LOG","00000","ConnID 5. Expect query resource > > (256 MB, 0.022727 CORE) x 1 ( MIN 1 ) resource after adjusting based on > > queue NVSEG limits.",,,,,,,0,,"resqueuemanager.c",1913, > > 2016-09-27 09:46:25.816603 CST,,,p760393,th-1935386496,,, > > ,0,con4,,seg-10000,,,,,"LOG","00000","Latency of getting resource > > allocated is 138us",,,,,,,0,,"resqueuemanager.c",4375, > > 2016-09-27 09:46:25.816743 CST,"gpadmin","gpadmin", > > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13 > > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","ConnID 5. > > Acquired resource from resource manager, (256 MB, 0.022727 CORE) x > > 1.",,,,,,"select count(*) from hcatalog.default.t1;",0,," > > rmcomm_QD2RM.c",868, > > 2016-09-27 09:46:25.816868 CST,"gpadmin","gpadmin", > > p764498,th-1935386496,"10.0.230.20","16234",2016-09-27 09:31:13 > > CST,90355,con51,cmd20,seg-1,,,x90355,sx1,"LOG","00000","data locality > > ratio: 0.000; virtual segment number: 1; different host number: 1; > virtual > > segment number per host(avg/min/max): (1/1/1); segment size(avg/min/max): > > (0.000 B/0 B/0 B); segment size with penalty(avg/min/max): (0.000 B/0 B/0 > > B); continuity(avg/min/max): (0.000/0.000/0.000).",,,,,,"select count(*) > > from hcatalog.default.t1;",0,,"cdbdatalocality.c",3396, > > > > > > I don't know why hawq only get such little resources. > > Is there any parameters I can set to let it (query hive using pxf and > > hcatalog) faster like in hive directly. > > > > > > > -- *Kyle Dunn | Data Engineering | Pivotal* Direct: 303.905.3171 <3039053171> | Email: [email protected]
