[ 
https://issues.apache.org/jira/browse/IMPALA-12689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17838398#comment-17838398
 ] 

Joe McDonnell commented on IMPALA-12689:
----------------------------------------

Fixed by:
{noformat}
commit cd9260e5276d0e342b21869c51e71aea9643504c
Author: Joe McDonnell <joemcdonn...@cloudera.com>
Date:   Thu Feb 15 18:22:15 2024 -0800    IMPALA-12689: Change TPC-H and TPC-DS 
builds to respect CFLAGS
    
    The TPC-H and TPC-DS builds currently do not respect the
    CFLAGS environment variable, so they don't incorporate the
    values that we set in init-compiler.sh.
    
    This modifies the build scripts for TPC-H and TPC-DS to
    patch their makefiles to add our CFLAGS. This has the
    side effect of turning on -O3 optimization, resulting
    in faster binaries used to generate the TPC-H and
    TPC-DS datasets:
    
    TPC-H's dbgen at scale 42:
    Unoptimized: 4m46.269s
    Optimized: 3m46.379s
    
    TPC-DS's dsdgen at scale 20:
    Unoptimized: 9m41.441s
    Optimized: 7m25.017s
    
    Testing:
     - Ran a build and verified that the flags include our
       CFLAGS value
    
    Change-Id: I3f999b71c56a72c14f1beeea99a3689b82a4d45a
    Reviewed-on: http://gerrit.cloudera.org:8080/21111
    Reviewed-by: Michael Smith <michael.sm...@cloudera.com>
    Tested-by: Joe McDonnell <joemcdonn...@cloudera.com>
{noformat}

> Toolchain TPC-H and TPC-DS binaries are not built with optimizations
> --------------------------------------------------------------------
>
>                 Key: IMPALA-12689
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12689
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: Impala 4.4.0
>            Reporter: Joe McDonnell
>            Priority: Major
>
> The tpc-h and tpc-ds components of the toolchain do not enable any kind of 
> compiler optimization flags. This is irrelevant to Impala's shipped binary, 
> but it does impact the performance of the data generators for TPC-H and 
> TPC-DS. Turning on -O3 seems to improve the data generation time by ~25%.
> {noformat}
> ##### TPC-H ########
> # Unoptimized
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    4m46.269s
> user    4m20.982s
> sys     0m19.390s
> # -O3
> $ time ./dbgen -f -s 42
> TPC-H Population Generator (Version 2.17.0)
> Copyright Transaction Processing Performance Council 1994 - 2010
> real    3m46.379s
> user    3m23.721s
> sys     0m18.436s
> ##### TPC-DS #######
> # Unoptimized
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    9m41.441s
> user    8m3.447s
> sys     1m37.944s
> # -O3
> $ time ./dsdgen -force -scale 20
> DBGEN2 Population Generator (Version 2.0.0)
> Copyright Transaction Processing Performance Council (TPC) 2001 - 2015
> Warning: Selected scale factor is NOT valid for result publication
> real    7m25.017s
> user    5m48.487s
> sys     1m36.265s
> {noformat}
> We should modify the toolchain to add -O3 to these builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org

Reply via email to