[
https://issues.apache.org/jira/browse/TRAFODION-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593641#comment-15593641
]
ASF GitHub Bot commented on TRAFODION-2301:
-------------------------------------------
GitHub user robertamarton opened a pull request:
https://github.com/apache/incubator-trafodion/pull/773
[TRAFODION-2301]: Hadoop crash with logs TMUDF
Today the UDF event_log_reader scans all logs, loads events into memory and
then discards the rows that are not needed. Waiting until the end to
discard
rows takes too much memory and causes system issues.
The immediate solution is to use predicate pushdown; that is, specify
predicates
on the query using the event_log_reader UDF to limit the scope of the data
flow.
These predicates will be pushed into the UDF so the UDF only returns the
required rows instead of all the rows. Initially only comparison
predicates are
pushed down to the event_log_reader UDF.
In addition to predicate pushdown, a new option has been added to the
event_log_reader UDF - the 's' (statistics) option. This option reports how
many log files were accessed, how many records were read, and how many
records
were returned. By specifying timestamp ranges, severity types, sql_codes,
and
the like, the number of returned rows can be reduced.
Example output:
Prior to change:
select count(*) from udf(event_log_reader('s'))
where severity = 'INFO' and
log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
(16497) EVENT_LOG_READER results:
number log files opened: 113, number log files read: 113,
number rows read: 2820, number rows returned: 2736
After change:
select count(*) from udf(event_log_reader('s'))
where severity = 'INFO' and
log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
(17046) EVENT_LOG_READER results:
number log files opened: 115, number log files read: 115,
number rows read: 2823, number rows returned: 109
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/robertamarton/incubator-trafodion
trafodion-1758
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-trafodion/pull/773.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #773
----
commit 913d2337e029a0f904539a1d9d6ea064f90aa6ab
Author: Roberta Marton <[email protected]>
Date: 2016-10-21T01:37:33Z
[TRAFODION-2301]: Hadoop crash with logs TMUDF
Today the UDF event_log_reader scans all logs, loads events into memory and
then discards the rows that are not needed. Waiting until the end to
discard
rows takes too much memory and causes system issues.
The immediate solution is to use predicate pushdown; that is, specify
predicates
on the query using the event_log_reader UDF to limit the scope of the data
flow.
These predicates will be pushed into the UDF so the UDF only returns the
required rows instead of all the rows. Initially only comparison
predicates are
pushed down to the event_log_reader UDF.
In addition to predicate pushdown, a new option has been added to the
event_log_reader UDF - the 's' (statistics) option. This option reports how
many log files were accessed, how many records were read, and how many
records
were returned. By specifying timestamp ranges, severity types, sql_codes,
and
the like, the number of returned rows can be reduced.
Example output:
Prior to change:
select count(*) from udf(event_log_reader('s'))
where severity = 'INFO' and
log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
(16497) EVENT_LOG_READER results:
number log files opened: 113, number log files read: 113,
number rows read: 2820, number rows returned: 2736
After change:
select count(*) from udf(event_log_reader('s'))
where severity = 'INFO' and
log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
(17046) EVENT_LOG_READER results:
number log files opened: 115, number log files read: 115,
number rows read: 2823, number rows returned: 109
----
> Hadoop stack crash with LOGS TMUDF
> ----------------------------------
>
> Key: TRAFODION-2301
> URL: https://issues.apache.org/jira/browse/TRAFODION-2301
> Project: Apache Trafodion
> Issue Type: Bug
> Components: sql-general
> Reporter: Roberta Marton
> Assignee: Roberta Marton
>
> There seems to be a problem is a that shows up when a tdm_udrsrvr process
> takes most of the available Memory. When this happens, too much CPU time is
> being used to handle this memory issue which causes some of Hadoop processes
> to crash due to lack of resources. In addition, Linux is slow to respond
> from the terminal.
>
> It can be reproduced from DB Mgr using log page which calls UDR in the
> backend. Calling this UDF (event_log_reader) from sqlci can also reproduce
> the problem.
>
> When using sqlci, SQL prints out the udrserver process launched info but then
> just hangs.
> >>select [first 1]* from udf(event_log_reader());
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)