[
https://issues.apache.org/jira/browse/HAWQ-218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15059441#comment-15059441
]
Ruilong Huo commented on HAWQ-218:
----------------------------------
Root cause analysis: we have two phases to be done in HAWQ before query is
executed on segment: 1) uses planner to generate query plan, calculate
datalocality accordingly, and allocates virtual segment and resource for query;
2) uses optimizer to generate query plan for execution. In some cases, the
plans generated from 1) and 2) disagree each other. See below for examples. To
be specific, in phase 1), planner thinks the query can be done on entry
database, and it needs no resource and plan dispatch; while in phase 2),
optimizer thinks it needs resource and tries to dispatch resource along with
plan to segments.
1. Optimizer ON
{noformat}
gptest=# show optimizer;
optimizer
-----------
on
(1 row)
gptest=# explain select * from foo where b = 150;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=12)
-> Sequence (cost=0.00..431.00 rows=1 width=12)
-> Partition Selector for foo (dynamic scan id: 1)
(cost=10.00..100.00 rows=50 width=4)
Filter: b = 150
Partitions selected: 0 (out of 10)
-> Dynamic Table Scan on foo (dynamic scan id: 1) (cost=0.00..431.00
rows=1 width=12)
Filter: b = 150
Settings: optimizer=on
Optimizer status: PQO version 1.591
(9 rows)
gptest=# explain analyze select * from foo where b = 150;
QUERY PLAN
---------------------------------------------------------------------------------------------------
Gather Motion 2:1 (slice1; segments: 2) (cost=0.00..431.00 rows=1 width=12)
Rows out: 0 rows at destination with 2.610 ms to end, start offset by 12 ms.
-> Sequence (cost=0.00..431.00 rows=1 width=12)
Rows out: 0 rows (seg0) with 0.043 ms to end, start offset by 14 ms.
-> Partition Selector for foo (dynamic scan id: 1)
(cost=10.00..100.00 rows=50 width=4)
Filter: b = 150
Partitions selected: 0 (out of 10)
Rows out: 0 rows (seg0) with 0 ms to end.
-> Dynamic Table Scan on foo (dynamic scan id: 1) (cost=0.00..431.00
rows=1 width=12)
Filter: b = 150
Rows out: 0 rows (seg0) with 0.016 ms to end, start offset by
14 ms.
Partitions scanned: 0 (out of 10).
Slice statistics:
(slice0) Executor memory: 211K bytes.
(slice1) Executor memory: 179K bytes avg x 2 workers, 179K bytes max
(seg0).
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=on
Optimizer status: PQO version 1.591
Total runtime: 14.680 ms
(20 rows)
{noformat}
2. Optimizer OFF
{noformat}
gptest=# show optimizer;
optimizer
-----------
off
(1 row)
gptest=# explain select * from foo where b = 150;
QUERY PLAN
------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
Settings: optimizer=off
Optimizer status: legacy query optimizer
(4 rows)
gptest=# explain analyze select * from foo where b = 150;
QUERY PLAN
---------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0)
One-Time Filter: false
Rows out: 0 rows with 0.001 ms to end, start offset by 0.048 ms.
Slice statistics:
(slice0) Executor memory: 13K bytes.
Statement statistics:
Memory used: 128000K bytes
Settings: optimizer=off
Optimizer status: legacy query optimizer
Total runtime: 0.049 ms
(10 rows)
{noformat}
> Core dump while selecting data from partitioned table with where condition
> that belongs to no partition
> -------------------------------------------------------------------------------------------------------
>
> Key: HAWQ-218
> URL: https://issues.apache.org/jira/browse/HAWQ-218
> Project: Apache HAWQ
> Issue Type: Bug
> Components: Core
> Affects Versions: 2.0.0-beta-incubating
> Reporter: Ruilong Huo
> Assignee: Ruilong Huo
> Priority: Critical
>
> HAWQ core dumps while calling UDF with EXPLAIN statement in it.
> Reproduction steps:
> Step 1: prepare schema and data
> {noformat}
> create table foo(a int, b int, c int) partition by range (b) (start (1) end
> (101) every (10));
> insert into foo select generate_series(1,5), generate_series(1,100),
> generate_series(1,10);
> analyze foo;
> {noformat}
> Step 2: run query
> {noformat}
> gptest=# select * from foo where b = 150;
> FATAL: Unexpected internal error (dispatcher.c:456)
> DETAIL: FailedAssertion("!(resource)", File: "dispatcher.c", Line: 456)
> HINT: Process 29088 will wait for gp_debug_linger=120 seconds before
> termination.
> Note that its locks and other resources will not be released until then.
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
> {noformat}
> Step 3: call stack in core dump
> {noformat}
> (gdb) bt
> #0 0x00000038cbe32625 in raise () from /lib64/libc.so.6
> #1 0x00000038cbe33e05 in abort () from /lib64/libc.so.6
> #2 0x00000000009dc10c in ExceptionalCondition (conditionName=0xe06c0e
> "!(resource)", errorType=0xe06be3 "FailedAssertion", fileName=0xe06bc0
> "dispatcher.c", lineNumber=456) at assert.c:60
> #3 0x0000000000b82260 in initialize_dispatch_data (resource=0x0,
> dispatch_to_all_cached_executors=0 '\000') at dispatcher.c:456
> #4 0x000000000071ac23 in ExecutorStart (queryDesc=0x28b5460, eflags=0) at
> execMain.c:908
> #5 0x00000000008ff7df in PortalStart (portal=0x28d3498, params=0x0,
> snapshot=0x0, seqServerHost=0x0, seqServerPort=-1) at pquery.c:1222
> #6 0x00000000008f5e76 in exec_simple_query (query_string=0x26b0198 "select *
> from foo where b = 150;", seqServerHost=0x0, seqServerPort=-1) at
> postgres.c:1680
> #7 0x00000000008fab80 in PostgresMain (argc=4, argv=0x25fac28,
> username=0x25faa80 "gpadmin") at postgres.c:4688
> #8 0x00000000008a0ec3 in BackendRun (port=0x25adc80) at postmaster.c:5834
> #9 0x00000000008a034d in BackendStartup (port=0x25adc80) at postmaster.c:5427
> #10 0x000000000089a9d9 in ServerLoop () at postmaster.c:2129
> #11 0x0000000000899a9c in PostmasterMain (argc=9, argv=0x25c4940) at
> postmaster.c:1421
> #12 0x00000000007b3472 in main (argc=9, argv=0x25c4940) at main.c:226
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)