Hi Ken,

Yes the query provided should work, I validated that the query below works
as expected which is fairly similar to the one listed above.

select
    count(*)
from
    store_sales_mp,
    date_dim
where
    ss_sold_date_sk = d_date_sk
        and ss_store_sk = d_dow
        and d_year = 1999
        and d_moy = 1


store_sales_mp is partitioned on two columns :
  | # col_name                   | data_type
                                                | ss_sold_date_sk
   | bigint
                  | ss_store_sk                  | bigint



The query execution summary shows only a small number of rows from
store_sales_mp qualify the scan

+-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
| Operator        | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows |
Peak Mem | Est. Peak Mem | Detail                                     |
+-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
| 06:AGGREGATE    | 1      | 160.66ms | 160.66ms | 1      | 1          |
88.00 KB | -1 B          | FINALIZE                                   |
| 05:EXCHANGE     | 1      | 137.98us | 137.98us | 19     | 1          | 0
B      | -1 B          | UNPARTITIONED                              |
| 03:AGGREGATE    | 19     | 151.01ms | 183.65ms | 19     | 1          |
8.04 MB  | 10.00 MB      |                                            |
| 02:HASH JOIN    | 19     | 101.27ms | 185.51ms | 4.82M  | 43.20B     |
2.02 MB  | 3.89 KB       | INNER JOIN, BROADCAST                      |
| |--04:EXCHANGE  | 19     | 14.54us  | 25.36us  | 589    | 181        | 0
B      | 0 B           | BROADCAST                                  |
| |  01:SCAN HDFS | 1      | 20.39ms  | 20.39ms  | 31     | 181        |
1.82 MB  | 64.00 MB      | tpcds_15000_decimal_parquet.date_dim       |
| 00:SCAN HDFS    | 19     | 37.54ms  | 81.18ms  | 45.00M | 43.20B     |
9.84 MB  | 0 B           | tpcds_15000_decimal_parquet.store_sales_mp |
+-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+



Finally from the query profile

      HDFS_SCAN_NODE (id=0):(Total: 37.539ms, non-child: 37.539ms, %
non-child: 100.00%)
         - AverageHdfsReadThreadConcurrency: 0.00
         - AverageScannerThreadConcurrency: 0.93
         - BytesRead: 489.47 KB (501221)
         - BytesReadDataNodeCache: 0
         - BytesReadLocal: 468.42 KB (479663)
         - BytesReadRemoteUnexpected: 21.05 KB (21557)
         - BytesReadShortCircuit: 468.42 KB (479663)
         - DecompressionTime: 0.000ns
         - MaxCompressedTextFileLength: 0
         - NumColumns: 0 (0)
         - NumDisksAccessed: 4 (4)
         - NumRowGroups: 0 (0)
         - NumScannerThreadsStarted: 5 (5)
         - PeakMemoryUsage: 9.35 MB (9799572)
         - PerReadThreadRawHdfsThroughput: 901.98 MB/sec
         - RemoteScanRanges: 0 (0)
         - RowsRead: 2.37M (2368521)
         - RowsReturned: 2.37M (2368521)
         - RowsReturnedRate: 62.23 M/sec
         - ScanRangesComplete: 3.21K (3207)
         - ScannerThreadsInvoluntaryContextSwitches: 26 (26)
         - ScannerThreadsTotalWallClockTime: 420.999ms
           - MaterializeTupleTime(*): 471.000ns
           - ScannerThreadsSysTime: 82.142ms
           - ScannerThreadsUserTime: 65.777ms
         - ScannerThreadsVoluntaryContextSwitches: 5.41K (5413)
         - TotalRawHdfsReadTime(*): 1.940ms
         - TotalReadThroughput: 136.84 KB/sec
        Filter 0:
           - Files processed: 3.18K (3184)
           - Files rejected: 3.13K (3130)
           - Files total: 3.18K (3184)
           - RowGroups processed: 0 (0)
           - RowGroups rejected: 0 (0)
           - RowGroups total: 0 (0)
           - Rows processed: 0 (0)
           - Rows rejected: 0 (0)
           - Rows total: 0 (0)
           - Splits processed: 4 (4)
           - Splits rejected: 0 (0)
           - Splits total: 4 (4)
        Filter 1:
           - Files processed: 54 (54)
           - Files rejected: 49 (49)
           - Files total: 54 (54)
           - RowGroups processed: 0 (0)
           - RowGroups rejected: 0 (0)
           - RowGroups total: 0 (0)
           - Rows processed: 0 (0)
           - Rows rejected: 0 (0)
           - Rows total: 0 (0)
           - Splits processed: 4 (4)
           - Splits rejected: 0 (0)
           - Splits total: 4 (4)



Thanks
Mostafa


On Fri, Apr 8, 2016 at 10:04 AM, Ken Farmer <[email protected]> wrote:

> First off - congrats on this new release.  It looks like a lot of hard
> work went into it.
>
> I'm really excited about dynamic runtime filters, but have a question
> about whether it would apply to the following scenario, which seems oddly
> missing from the documentation.
>
> Assume 2 tables:
> 1.  fact_ids table with 50 billion rows partitioned by year, month, day,
> sensor_id
> 2.  dim_date dimension table with 500 rows, 1 row/day with date_id, and
> date_name, year, month, day columns
>
> Query that works fine today:
>     SELECT f.col1,
>            SUM(somecount)
>     FROM fact_ids f
>     WHERE f.year=2016 AND f.month=4 AND f.day=2
>     GROUP BY 1  ORDER BY 1
>
>
> Query that I hope works with 5.7:
>      SELECT f.col1,
>             SUM(somecount)
>      FROM fact_ids f
>           INNER JOIN dim_date d
>               ON f.year   = d.year
>               AND f.month = d.month
>               AND f.day   = d.day
>      WHERE d.date_name BETWEEN '2016-03-02' AND '2016-04-01'
>      GROUP BY 1  ORDER BY 1
>
> Can anyone confirm that 5.7 will perform dynamic runtime filtering on this
> second example above?
>
> Anything else I should be aware of for cases like this?  Or limitations of
> dynamic runtime filtering?
>
>
> Thanks!
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>

Reply via email to