GitHub user robertamarton opened a pull request:

    https://github.com/apache/incubator-trafodion/pull/773

    [TRAFODION-2301]: Hadoop crash with logs TMUDF

    Today the UDF event_log_reader scans all logs, loads events into memory and
    then discards the rows that are not needed.  Waiting until the end to 
discard
    rows takes too much memory and causes system issues.
    
    The immediate solution is to use predicate pushdown; that is, specify 
predicates
    on the query using the event_log_reader UDF to limit the scope of the data 
flow.
    These predicates will be pushed into the UDF so the UDF only returns the
    required rows instead of all the rows.  Initially only comparison 
predicates are
    pushed down to the event_log_reader UDF.
    
    In addition to predicate pushdown, a new option has been added to the
    event_log_reader UDF - the 's' (statistics) option.  This option reports how
    many log files were accessed, how many records were read, and how many 
records
    were returned.  By specifying timestamp ranges, severity types, sql_codes, 
and
    the like, the number of returned rows can be reduced.
    
    Example output:
    
    Prior to change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
            log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (16497) EVENT_LOG_READER results:
              number log files opened: 113, number log files read: 113,
              number rows read: 2820, number rows returned: 2736
    
    After change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
      log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (17046) EVENT_LOG_READER results:
              number log files opened: 115, number log files read: 115,
              number rows read: 2823, number rows returned: 109

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/robertamarton/incubator-trafodion 
trafodion-1758

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-trafodion/pull/773.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #773
    
----
commit 913d2337e029a0f904539a1d9d6ea064f90aa6ab
Author: Roberta Marton <rmarton@edev07.esgyn.local>
Date:   2016-10-21T01:37:33Z

    [TRAFODION-2301]: Hadoop crash with logs TMUDF
    
    Today the UDF event_log_reader scans all logs, loads events into memory and
    then discards the rows that are not needed.  Waiting until the end to 
discard
    rows takes too much memory and causes system issues.
    
    The immediate solution is to use predicate pushdown; that is, specify 
predicates
    on the query using the event_log_reader UDF to limit the scope of the data 
flow.
    These predicates will be pushed into the UDF so the UDF only returns the
    required rows instead of all the rows.  Initially only comparison 
predicates are
    pushed down to the event_log_reader UDF.
    
    In addition to predicate pushdown, a new option has been added to the
    event_log_reader UDF - the 's' (statistics) option.  This option reports how
    many log files were accessed, how many records were read, and how many 
records
    were returned.  By specifying timestamp ranges, severity types, sql_codes, 
and
    the like, the number of returned rows can be reduced.
    
    Example output:
    
    Prior to change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
            log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (16497) EVENT_LOG_READER results:
              number log files opened: 113, number log files read: 113,
              number rows read: 2820, number rows returned: 2736
    
    After change:
    
    select count(*) from udf(event_log_reader('s'))
      where severity = 'INFO' and
      log_ts between '2016-10-18 00:00:00' and '2016-10-18 22:22:22';
    
    (17046) EVENT_LOG_READER results:
              number log files opened: 115, number log files read: 115,
              number rows read: 2823, number rows returned: 109

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to