[
https://issues.apache.org/jira/browse/OOZIE-2626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15508644#comment-15508644
]
Fady K Sedrak commented on OOZIE-2626:
--------------------------------------
I tested different indexes on many of slow queries in my repro environment. I
concluded that these 2 queries would improve the performance significantly.
Query1:
SELECT TOP 1 t0.id, t0.app_name, t0.status, t0.run, t0.user_name,
t0.group_name, t0.created_time, t0.start_time, t0.last_modified_time,
t0.end_time, t0.external_id, t0.parent_id FROM WF_JOBS t0 ORDER BY
t0.created_time DESC
Recommended Index:
CREATE NONCLUSTERED INDEX [nci_wi_WF_JOBS_3A82B3F6CFB2A06DB9FD] ON
[dbo].[WF_JOBS]
([created_time] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON)
Statistics:
Before:
Table 'WF_JOBS'. Scan count 1, logical reads 31803, physical reads 256,
read-ahead reads 31589, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
SQL Server Execution Times:
CPU time = 281 ms, elapsed time = 216350 ms.
After:
Table 'WF_JOBS'. Scan count 1, logical reads 8, physical reads 0, read-ahead
reads 24, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.
Query 2:
(@P0 bigint,@P1 varchar(8000),@P2 datetime2(7))SELECT t0.id FROM WF_ACTIONS t0
WHERE (t0.pending = @P0 AND t0.status = @P1 AND t0.last_check_time < @P2)
Recommended Index:
CREATE NONCLUSTERED INDEX [nci_wi_WF_ACTIONS_D23BAEF4FD5EEA2EBE76] ON
[dbo].[WF_ACTIONS]
(
[pending] ASC,
[status] ASC,
[last_check_time] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Statistics:
Before:
Table 'WF_ACTIONS'. Scan count 1, logical reads 81894, physical reads 1330,
read-ahead reads 79172, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
SQL Server Execution Times:
CPU time = 875 ms, elapsed time = 652591 ms.
After:
Table 'WF_ACTIONS'. Scan count 1, logical reads 83, physical reads 0,
read-ahead reads 1, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 167 ms.
Unfortunately openjpa doesn't support composite indexes. So I had to work
around the problem by create 2 simple indexes indexes on last_check_time &
pending.
Query 2:
(@P0 bigint,@P1 varchar(8000),@P2 datetime2(7))SELECT t0.id FROM WF_ACTIONS t0
WHERE (t0.pending = @P0 AND t0.status = @P1 AND t0.last_check_time < @P2)
Recommended Index:
CREATE NONCLUSTERED INDEX [I_WF_JOBS_PENDING] ON [dbo].[WF_ACTIONS]
(
[pending] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
CREATE NONCLUSTERED INDEX [I_WF_JOBS_LAST_CHECK_TIME] ON [dbo].[WF_ACTIONS]
(
[pending] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
Statistics:
Before:
Table 'WF_ACTIONS'. Scan count 1, logical reads 82034, physical reads 89,
read-ahead reads 36446, lob logical reads 0, lob physical reads 0, lob
read-ahead reads 0.
SQL Server Execution Times:
CPU time = 94 ms, elapsed time = 641094 ms.
After:
Table 'WF_ACTIONS'. Scan count 3, logical reads 10831, physical reads 0,
read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead
reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead
reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 78 ms, elapsed time = 5085 ms.
> Oozie workflow processing becomes slow after the increase of rows in WF_JOBS
> and WF_ACTIONS tables when the metastore is running on SQL Server
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: OOZIE-2626
> URL: https://issues.apache.org/jira/browse/OOZIE-2626
> Project: Oozie
> Issue Type: Bug
> Components: coordinator, workflow
> Reporter: Fady K Sedrak
>
> Oozie workflow processing becomes slow after the increase of rows in WF_JOBS
> and WF_ACTIONS tables when running against SQL Server. A secondary index need
> to be created against WF_JOBS (created_time), WF_ACTIONS (last_check_time &
> pending) to avoid going for a full table scan when querying against these
> fields.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)