[jira] [Commented] (TRAFODION-2301) Hadoop stack crash with LOGS TMUDF

ASF GitHub Bot (JIRA) Thu, 20 Oct 2016 18:39:32 -0700

    [ 
https://issues.apache.org/jira/browse/TRAFODION-2301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15593641#comment-15593641
 ]


ASF GitHub Bot commented on TRAFODION-2301:
-------------------------------------------

GitHub user robertamarton opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/773

    [TRAFODION-2301]: Hadoop crash with logs TMUDF

    Today the UDF event_log_reader scans all logs, loads events into memory and
    then discards the rows that are not needed.  Waiting until the end to 
discard
    rows takes too much memory and causes system issues.
    
    The immediate solution is to use predicate pushdown; that is, specify 
predicates
    on the query using the event_log_reader UDF to limit the scope of the data 
flow.
    These predicates will be pushed into the UDF so the UDF only returns the
    required rows instead of all the rows.  Initially only comparison 
predicates are
    pushed down to the event_log_reader UDF.
    
    In addition to predicate pushdown, a new option has been added to the
    event_log_reader UDF - the 's' (statistics) option.  This option reports how
    many log files were accessed, how many records were read, and how many 
records
    were returned.  By specifying timestamp ranges, severity types, sql_codes, 
and
    the like, the number of returned rows can be reduced.
    
    Example output:
    
    Prior to change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
            log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (16497) EVENT_LOG_READER results:
              number log files opened: 113, number log files read: 113,
              number rows read: 2820, number rows returned: 2736
    
    After change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
      log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (17046) EVENT_LOG_READER results:
              number log files opened: 115, number log files read: 115,
              number rows read: 2823, number rows returned: 109

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/robertamarton/incubator-trafodion 
trafodion-1758

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/773.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #773
    
----
commit 913d2337e029a0f904539a1d9d6ea064f90aa6ab
Author: Roberta Marton <[email protected]>
Date:   2016-10-21T01:37:33Z

    [TRAFODION-2301]: Hadoop crash with logs TMUDF
    
    Today the UDF event_log_reader scans all logs, loads events into memory and
    then discards the rows that are not needed.  Waiting until the end to 
discard
    rows takes too much memory and causes system issues.
    
    The immediate solution is to use predicate pushdown; that is, specify 
predicates
    on the query using the event_log_reader UDF to limit the scope of the data 
flow.
    These predicates will be pushed into the UDF so the UDF only returns the
    required rows instead of all the rows.  Initially only comparison 
predicates are
    pushed down to the event_log_reader UDF.
    
    In addition to predicate pushdown, a new option has been added to the
    event_log_reader UDF - the 's' (statistics) option.  This option reports how
    many log files were accessed, how many records were read, and how many 
records
    were returned.  By specifying timestamp ranges, severity types, sql_codes, 
and
    the like, the number of returned rows can be reduced.
    
    Example output:
    
    Prior to change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
            log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (16497) EVENT_LOG_READER results:
              number log files opened: 113, number log files read: 113,
              number rows read: 2820, number rows returned: 2736
    
    After change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
      log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (17046) EVENT_LOG_READER results:
              number log files opened: 115, number log files read: 115,
              number rows read: 2823, number rows returned: 109

----


> Hadoop stack crash with LOGS TMUDF
> ----------------------------------
>
>                 Key: TRAFODION-2301
>                 URL: https://issues.apache.org/jira/browse/TRAFODION-2301
>             Project: Apache Trafodion
>          Issue Type: Bug
>          Components: sql-general
>            Reporter: Roberta Marton
>            Assignee: Roberta Marton
>
> There seems to be a problem  is a that shows up when a tdm_udrsrvr process 
> takes  most of the available Memory. When this happens, too much CPU time is 
> being used to handle this memory issue which causes some of Hadoop processes 
> to crash due to  lack of resources.  In addition, Linux is slow to respond 
> from the terminal.
>  
> It can be reproduced from DB Mgr using log page which calls UDR in the 
> backend.  Calling this UDF (event_log_reader) from sqlci can also reproduce 
> the problem.
>  
> When using sqlci, SQL prints out the udrserver process launched info but then 
> just hangs.
> >>select [first 1]* from udf(event_log_reader()); 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TRAFODION-2301) Hadoop stack crash with LOGS TMUDF

Reply via email to