Re: Dynamic Runtime Filter Question

Mostafa Mokhtar Fri, 08 Apr 2016 15:28:39 -0700

Runtime filters work against partitions as well as individual rows, so if
you decide not to partition by sensor you should still see lots of benefit
from filtering out the sensor_id(s) that won't qualify the join.


For the query below there will be 4x filters created :

   1. RF001 to filter partitions against f.year
   2. RF002 to filter partitions against f.month
   3. RF003 to filter partitions against f.day
   4. RF004 to filter rows against f.sensor_id

SELECT
    f.col1, SUM(somecount)
FROM
    fact_ids f
        INNER JOIN
    dim_date d ON f.year = d.year AND f.month = d.month
        AND f.day = d.day
        inner join
    sensors s ON f.sensor_id = s.sensor_id
WHERE
    d.date_name BETWEEN '2016-03-02' AND '2016-04-01'
        and s.sensor_state = 'CA'
GROUP BY 1
ORDER BY 1

My recommendation would be to partition by day and rely on row level
runtime filters to reduce the rows further.



On Fri, Apr 8, 2016 at 3:14 PM, Ken Farmer <[email protected]> wrote:

> John,
>
> We can get anywhere from 1 to 100 million rows per partition, and these
> daily partitions help enormously when analyzing the busier sensors by
> constraining the queries to just a few days when that's feasible.  Almost
> all my regularly-scheduled processes are built to work incrementally, one
> day at a time.   Besides speeding up the queries enormously it also results
> in a huge cost savings - by allowing us to use a smaller cluster.
>
> But the number of sensors is increasing, so I'll be changing the
> partitioning strategy to move away from sensors to sensor-groups to reduce
> the total number of partitions.  I haven't been able to benchmark the
> differences, just considering warnings to try to stay below 30,000 total
> partitions/table.
>
> I'm looking at one of two different date_id formats:  either a four-digit
> monotonically-increasing number, or an iso8601 basic format of yyyymmdd as
> you point out above.  That's a choice between a 16-bit that requires a join
> against a tiny & useful table vs 32-bit that could also be used without a
> join or with that tiny & useful table.  I'll probably benchmark the
> differences - and glad to accept input on testing & share the results if
> you're interested.
>
> Ken
>
>
>
>
>
>
>
> On Fri, Apr 8, 2016 at 3:38 PM, John Russell <[email protected]>
> wrote:
>
>>
>> On Apr 8, 2016, at 12:06 PM, Ken Farmer <[email protected]> wrote:
>>
>> > Ken, what is the extra detail you were hoping to see in the docs?
>> > More examples of SQL idioms that do/don't result in runtime filters?
>>
>> Yep, you got it: because my partitioning depends on multiple columns,
>> typically often used together, I really need to use a join rather than
>> subquery.
>>
>> And at this point, assuming that the dynamic runtime filters work well &
>> consistently I'll probably shift our standard partitioning columns away
>> from year, month, day to date_id - a foreign key into.  Unless anyone can
>> recommend a reason not to.
>>
>>
>> A couple of thoughts on that last point, about changing partition scheme
>> and joining on a single string column.  (From the book-learning
>> perspective, since I don't run daily reports myself, so I tend not to
>> partition so granularly.)
>>
>> Sounds overall like a reasonable idea, and I'm happy that the notion of
>> runtime filtering opens this up as a possibility for you.
>>
>> I guess you won't consolidate any more data within each partition, since
>> there will still be the same number of 'days' in each case.  I'm always
>> looking for the opportunity to take partition-by-day tables and switch to
>> partition-by-month, to try to get decent-sized files in each partition.
>> Sounds like you have 100M rows / day, but also divided by sensor.  How many
>> rows in a typical partition?
>>
>> I've heard many times about better efficiency of integer vs. string for
>> joins, group by, etc.  I wonder if it would be faster and more
>> memory-efficient overall to encode the day ID as INT, e.g. 20160408.  Then
>> decompose/reformat only at the end for reporting purposes, or keep nicely
>> formatted string representations in the dim_date dimension table.  That is
>> an experiment I haven't tried with enough volume to see for sure.
>>
>> John
>>
>> On Fri, Apr 8, 2016 at 12:53 PM, John Russell <[email protected]>
>> wrote:
>>
>>> Ken, what is the extra detail you were hoping to see in the docs?  More
>>> examples of SQL idioms that do/don't result in runtime filters?  (Like in
>>> your example, I could imagine being curious about INNER JOIN as opposed to
>>> other kinds of joins, ON clause with several ANDed predicates, and BETWEEN
>>> in the WHERE clause.)
>>>
>>> We're planning to add more such examples in upcoming doc refreshes.
>>> Just want to make sure we cover the kinds of areas you're curious about.
>>> Once 5.7 is installed somewhere to try out, it's straightforward to verify
>>> for particular combinations of tables and queries using EXPLAIN and looking
>>> for the extra lines representing filters.
>>>
>>> John
>>>
>>>
>>> On Apr 8, 2016, at 11:39 AM, Mostafa Mokhtar <[email protected]>
>>> wrote:
>>>
>>> Hi Ken,
>>>
>>> Yes the query provided should work, I validated that the query below
>>> works as expected which is fairly similar to the one listed above.
>>>
>>> select
>>>     count(*)
>>> from
>>>     store_sales_mp,
>>>     date_dim
>>> where
>>>     ss_sold_date_sk = d_date_sk
>>>         and ss_store_sk = d_dow
>>>         and d_year = 1999
>>>         and d_moy = 1
>>>
>>>
>>> store_sales_mp is partitioned on two columns :
>>>   | # col_name                   | data_type
>>>                                                     | ss_sold_date_sk
>>>            | bigint
>>>                           | ss_store_sk                  | bigint
>>>
>>>
>>>
>>> The query execution summary shows only a small number of rows from
>>> store_sales_mp qualify the scan
>>>
>>>
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>> | Operator        | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows |
>>> Peak Mem | Est. Peak Mem | Detail                                     |
>>>
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>> | 06:AGGREGATE    | 1      | 160.66ms | 160.66ms | 1      | 1          |
>>> 88.00 KB | -1 B          | FINALIZE                                   |
>>> | 05:EXCHANGE     | 1      | 137.98us | 137.98us | 19     | 1          |
>>> 0 B      | -1 B          | UNPARTITIONED                              |
>>> | 03:AGGREGATE    | 19     | 151.01ms | 183.65ms | 19     | 1          |
>>> 8.04 MB  | 10.00 MB      |                                            |
>>> | 02:HASH JOIN    | 19     | 101.27ms | 185.51ms | 4.82M  | 43.20B     |
>>> 2.02 MB  | 3.89 KB       | INNER JOIN, BROADCAST                      |
>>> | |--04:EXCHANGE  | 19     | 14.54us  | 25.36us  | 589    | 181        |
>>> 0 B      | 0 B           | BROADCAST                                  |
>>> | |  01:SCAN HDFS | 1      | 20.39ms  | 20.39ms  | 31     | 181        |
>>> 1.82 MB  | 64.00 MB      | tpcds_15000_decimal_parquet.date_dim       |
>>> | 00:SCAN HDFS    | 19     | 37.54ms  | 81.18ms  | 45.00M | 43.20B     |
>>> 9.84 MB  | 0 B           | tpcds_15000_decimal_parquet.store_sales_mp |
>>>
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>>
>>>
>>>
>>> Finally from the query profile
>>>
>>>       HDFS_SCAN_NODE (id=0):(Total: 37.539ms, non-child: 37.539ms, % 
>>> non-child: 100.00%)
>>>          - AverageHdfsReadThreadConcurrency: 0.00
>>>          - AverageScannerThreadConcurrency: 0.93
>>>          - BytesRead: 489.47 KB (501221)
>>>          - BytesReadDataNodeCache: 0
>>>          - BytesReadLocal: 468.42 KB (479663)
>>>          - BytesReadRemoteUnexpected: 21.05 KB (21557)
>>>          - BytesReadShortCircuit: 468.42 KB (479663)
>>>          - DecompressionTime: 0.000ns
>>>          - MaxCompressedTextFileLength: 0
>>>          - NumColumns: 0 (0)
>>>          - NumDisksAccessed: 4 (4)
>>>          - NumRowGroups: 0 (0)
>>>          - NumScannerThreadsStarted: 5 (5)
>>>          - PeakMemoryUsage: 9.35 MB (9799572)
>>>          - PerReadThreadRawHdfsThroughput: 901.98 MB/sec
>>>          - RemoteScanRanges: 0 (0)
>>>          - RowsRead: 2.37M (2368521)
>>>          - RowsReturned: 2.37M (2368521)
>>>          - RowsReturnedRate: 62.23 M/sec
>>>          - ScanRangesComplete: 3.21K (3207)
>>>          - ScannerThreadsInvoluntaryContextSwitches: 26 (26)
>>>          - ScannerThreadsTotalWallClockTime: 420.999ms
>>>            - MaterializeTupleTime(*): 471.000ns
>>>            - ScannerThreadsSysTime: 82.142ms
>>>            - ScannerThreadsUserTime: 65.777ms
>>>          - ScannerThreadsVoluntaryContextSwitches: 5.41K (5413)
>>>          - TotalRawHdfsReadTime(*): 1.940ms
>>>          - TotalReadThroughput: 136.84 KB/sec
>>>         Filter 0:
>>>            - Files processed: 3.18K (3184)
>>>            - Files rejected: 3.13K (3130)
>>>            - Files total: 3.18K (3184)
>>>            - RowGroups processed: 0 (0)
>>>            - RowGroups rejected: 0 (0)
>>>            - RowGroups total: 0 (0)
>>>            - Rows processed: 0 (0)
>>>            - Rows rejected: 0 (0)
>>>            - Rows total: 0 (0)
>>>            - Splits processed: 4 (4)
>>>            - Splits rejected: 0 (0)
>>>            - Splits total: 4 (4)
>>>         Filter 1:
>>>            - Files processed: 54 (54)
>>>            - Files rejected: 49 (49)
>>>            - Files total: 54 (54)
>>>            - RowGroups processed: 0 (0)
>>>            - RowGroups rejected: 0 (0)
>>>            - RowGroups total: 0 (0)
>>>            - Rows processed: 0 (0)
>>>            - Rows rejected: 0 (0)
>>>            - Rows total: 0 (0)
>>>            - Splits processed: 4 (4)
>>>            - Splits rejected: 0 (0)
>>>            - Splits total: 4 (4)
>>>
>>>
>>>
>>> Thanks
>>> Mostafa
>>>
>>>
>>> On Fri, Apr 8, 2016 at 10:04 AM, Ken Farmer <[email protected]> wrote:
>>>
>>>> First off - congrats on this new release.  It looks like a lot of hard
>>>> work went into it.
>>>>
>>>> I'm really excited about dynamic runtime filters, but have a question
>>>> about whether it would apply to the following scenario, which seems oddly
>>>> missing from the documentation.
>>>>
>>>> Assume 2 tables:
>>>> 1.  fact_ids table with 50 billion rows partitioned by year, month,
>>>> day, sensor_id
>>>> 2.  dim_date dimension table with 500 rows, 1 row/day with date_id, and
>>>> date_name, year, month, day columns
>>>>
>>>> Query that works fine today:
>>>>     SELECT f.col1,
>>>>            SUM(somecount)
>>>>     FROM fact_ids f
>>>>     WHERE f.year=2016 AND f.month=4 AND f.day=2
>>>>     GROUP BY 1  ORDER BY 1
>>>>
>>>>
>>>> Query that I hope works with 5.7:
>>>>      SELECT f.col1,
>>>>             SUM(somecount)
>>>>      FROM fact_ids f
>>>>           INNER JOIN dim_date d
>>>>               ON f.year   = d.year
>>>>               AND f.month = d.month
>>>>               AND f.day   = d.day
>>>>      WHERE d.date_name BETWEEN '2016-03-02' AND '2016-04-01'
>>>>      GROUP BY 1  ORDER BY 1
>>>>
>>>> Can anyone confirm that 5.7 will perform dynamic runtime filtering on
>>>> this second example above?
>>>>
>>>> Anything else I should be aware of for cases like this?  Or limitations
>>>> of dynamic runtime filtering?
>>>>
>>>>
>>>> Thanks!
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Impala User" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Impala User" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Impala User" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Impala User" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Impala User" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
>

Re: Dynamic Runtime Filter Question

Reply via email to