Re: Dynamic Runtime Filter Question

John Russell Fri, 08 Apr 2016 16:07:55 -0700

> That's a choice between a 16-bit that requires a join against a tiny & useful 
> table vs 32-bit that could also be used without a join or with that tiny & 
> useful table.  I'll probably benchmark the differences - and glad to accept 
> input on testing & share the results if you're interested.


Yes, very interested.  I know in theory there could be differences in speed and 
memory use, but in practice I don't know if there is some aspect with 
alignment/packing or minimum-sized intermediate variables that makes 16-bit vs. 
32-bit a wash for join keys, partition keys, etc.  (Actually I wonder about 
that more in the context of using TINYINT vs. INT for very small ranges like a 
DAY field.)

> I haven't been able to benchmark the differences, just considering warnings 
> to try to stay below 30,000 total partitions/table.

CDH 5.7 includes IMPALA-1599 and IMPALA-2560, which might improve the situation 
for your tables that are near the upper bounds in terms of number of partitions.

John

> On Apr 8, 2016, at 3:14 PM, Ken Farmer <[email protected]> wrote:
> 
> John,
> 
> We can get anywhere from 1 to 100 million rows per partition, and these daily 
> partitions help enormously when analyzing the busier sensors by constraining 
> the queries to just a few days when that's feasible.  Almost all my 
> regularly-scheduled processes are built to work incrementally, one day at a 
> time.   Besides speeding up the queries enormously it also results in a huge 
> cost savings - by allowing us to use a smaller cluster.
> 
> But the number of sensors is increasing, so I'll be changing the partitioning 
> strategy to move away from sensors to sensor-groups to reduce the total 
> number of partitions.  I haven't been able to benchmark the differences, just 
> considering warnings to try to stay below 30,000 total partitions/table.
> 
> I'm looking at one of two different date_id formats:  either a four-digit 
> monotonically-increasing number, or an iso8601 basic format of yyyymmdd as 
> you point out above.  That's a choice between a 16-bit that requires a join 
> against a tiny & useful table vs 32-bit that could also be used without a 
> join or with that tiny & useful table.  I'll probably benchmark the 
> differences - and glad to accept input on testing & share the results if 
> you're interested.
> 
> Ken
> 
> 
> 
> 
> 
> 
> 
> On Fri, Apr 8, 2016 at 3:38 PM, John Russell <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> On Apr 8, 2016, at 12:06 PM, Ken Farmer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> > Ken, what is the extra detail you were hoping to see in the docs?  
>> > More examples of SQL idioms that do/don't result in runtime filters? 
>> 
>> Yep, you got it: because my partitioning depends on multiple columns, 
>> typically often used together, I really need to use a join rather than 
>> subquery.
>> 
>> And at this point, assuming that the dynamic runtime filters work well & 
>> consistently I'll probably shift our standard partitioning columns away from 
>> year, month, day to date_id - a foreign key into.  Unless anyone can 
>> recommend a reason not to.  
> 
> A couple of thoughts on that last point, about changing partition scheme and 
> joining on a single string column.  (From the book-learning perspective, 
> since I don't run daily reports myself, so I tend not to partition so 
> granularly.)
> 
> Sounds overall like a reasonable idea, and I'm happy that the notion of 
> runtime filtering opens this up as a possibility for you.
> 
> I guess you won't consolidate any more data within each partition, since 
> there will still be the same number of 'days' in each case.  I'm always 
> looking for the opportunity to take partition-by-day tables and switch to 
> partition-by-month, to try to get decent-sized files in each partition.  
> Sounds like you have 100M rows / day, but also divided by sensor.  How many 
> rows in a typical partition?
> 
> I've heard many times about better efficiency of integer vs. string for 
> joins, group by, etc.  I wonder if it would be faster and more 
> memory-efficient overall to encode the day ID as INT, e.g. 20160408.  Then 
> decompose/reformat only at the end for reporting purposes, or keep nicely 
> formatted string representations in the dim_date dimension table.  That is an 
> experiment I haven't tried with enough volume to see for sure.
> 
> John
> 
>> On Fri, Apr 8, 2016 at 12:53 PM, John Russell <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Ken, what is the extra detail you were hoping to see in the docs?  More 
>> examples of SQL idioms that do/don't result in runtime filters?  (Like in 
>> your example, I could imagine being curious about INNER JOIN as opposed to 
>> other kinds of joins, ON clause with several ANDed predicates, and BETWEEN 
>> in the WHERE clause.)  
>> 
>> We're planning to add more such examples in upcoming doc refreshes.  Just 
>> want to make sure we cover the kinds of areas you're curious about.  Once 
>> 5.7 is installed somewhere to try out, it's straightforward to verify for 
>> particular combinations of tables and queries using EXPLAIN and looking for 
>> the extra lines representing filters.
>> 
>> John
>> 
>> 
>>> On Apr 8, 2016, at 11:39 AM, Mostafa Mokhtar <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi Ken,
>>> 
>>> Yes the query provided should work, I validated that the query below works 
>>> as expected which is fairly similar to the one listed above.
>>> 
>>> select 
>>>     count(*)
>>> from
>>>     store_sales_mp,
>>>     date_dim
>>> where
>>>     ss_sold_date_sk = d_date_sk
>>>         and ss_store_sk = d_dow
>>>         and d_year = 1999
>>>         and d_moy = 1
>>> 
>>> 
>>> store_sales_mp is partitioned on two columns :
>>>   | # col_name                   | data_type                                
>>>                                                | ss_sold_date_sk            
>>>   | bigint                                                                  
>>>                  | ss_store_sk                  | bigint                    
>>>                                                                
>>> 
>>> 
>>> The query execution summary shows only a small number of rows from 
>>> store_sales_mp qualify the scan
>>> 
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>> | Operator        | #Hosts | Avg Time | Max Time | #Rows  | Est. #Rows | 
>>> Peak Mem | Est. Peak Mem | Detail                                     |
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>> | 06:AGGREGATE    | 1      | 160.66ms | 160.66ms | 1      | 1          | 
>>> 88.00 KB | -1 B          | FINALIZE                                   |
>>> | 05:EXCHANGE     | 1      | 137.98us | 137.98us | 19     | 1          | 0 
>>> B      | -1 B          | UNPARTITIONED                              |
>>> | 03:AGGREGATE    | 19     | 151.01ms | 183.65ms | 19     | 1          | 
>>> 8.04 MB  | 10.00 MB      |                                            |
>>> | 02:HASH JOIN    | 19     | 101.27ms | 185.51ms | 4.82M  | 43.20B     | 
>>> 2.02 MB  | 3.89 KB       | INNER JOIN, BROADCAST                      |
>>> | |--04:EXCHANGE  | 19     | 14.54us  | 25.36us  | 589    | 181        | 0 
>>> B      | 0 B           | BROADCAST                                  |
>>> | |  01:SCAN HDFS | 1      | 20.39ms  | 20.39ms  | 31     | 181        | 
>>> 1.82 MB  | 64.00 MB      | tpcds_15000_decimal_parquet.date_dim       |
>>> | 00:SCAN HDFS    | 19     | 37.54ms  | 81.18ms  | 45.00M | 43.20B     | 
>>> 9.84 MB  | 0 B           | tpcds_15000_decimal_parquet.store_sales_mp |
>>> +-----------------+--------+----------+----------+--------+------------+----------+---------------+--------------------------------------------+
>>> 
>>> 
>>> 
>>> Finally from the query profile  
>>> 
>>>       HDFS_SCAN_NODE (id=0):(Total: 37.539ms, non-child: 37.539ms, % 
>>> non-child: 100.00%)
>>>          - AverageHdfsReadThreadConcurrency: 0.00 
>>>          - AverageScannerThreadConcurrency: 0.93 
>>>          - BytesRead: 489.47 KB (501221)
>>>          - BytesReadDataNodeCache: 0
>>>          - BytesReadLocal: 468.42 KB (479663)
>>>          - BytesReadRemoteUnexpected: 21.05 KB (21557)
>>>          - BytesReadShortCircuit: 468.42 KB (479663)
>>>          - DecompressionTime: 0.000ns
>>>          - MaxCompressedTextFileLength: 0
>>>          - NumColumns: 0 (0)
>>>          - NumDisksAccessed: 4 (4)
>>>          - NumRowGroups: 0 (0)
>>>          - NumScannerThreadsStarted: 5 (5)
>>>          - PeakMemoryUsage: 9.35 MB (9799572)
>>>          - PerReadThreadRawHdfsThroughput: 901.98 MB/sec
>>>          - RemoteScanRanges: 0 (0)
>>>          - RowsRead: 2.37M (2368521)
>>>          - RowsReturned: 2.37M (2368521)
>>>          - RowsReturnedRate: 62.23 M/sec
>>>          - ScanRangesComplete: 3.21K (3207)
>>>          - ScannerThreadsInvoluntaryContextSwitches: 26 (26)
>>>          - ScannerThreadsTotalWallClockTime: 420.999ms
>>>            - MaterializeTupleTime(*): 471.000ns
>>>            - ScannerThreadsSysTime: 82.142ms
>>>            - ScannerThreadsUserTime: 65.777ms
>>>          - ScannerThreadsVoluntaryContextSwitches: 5.41K (5413)
>>>          - TotalRawHdfsReadTime(*): 1.940ms
>>>          - TotalReadThroughput: 136.84 KB/sec
>>>         Filter 0:
>>>            - Files processed: 3.18K (3184)
>>>            - Files rejected: 3.13K (3130)
>>>            - Files total: 3.18K (3184)
>>>            - RowGroups processed: 0 (0)
>>>            - RowGroups rejected: 0 (0)
>>>            - RowGroups total: 0 (0)
>>>            - Rows processed: 0 (0)
>>>            - Rows rejected: 0 (0)
>>>            - Rows total: 0 (0)
>>>            - Splits processed: 4 (4)
>>>            - Splits rejected: 0 (0)
>>>            - Splits total: 4 (4)
>>>         Filter 1:
>>>            - Files processed: 54 (54)
>>>            - Files rejected: 49 (49)
>>>            - Files total: 54 (54)
>>>            - RowGroups processed: 0 (0)
>>>            - RowGroups rejected: 0 (0)
>>>            - RowGroups total: 0 (0)
>>>            - Rows processed: 0 (0)
>>>            - Rows rejected: 0 (0)
>>>            - Rows total: 0 (0)
>>>            - Splits processed: 4 (4)
>>>            - Splits rejected: 0 (0)
>>>            - Splits total: 4 (4)
>>> 
>>> 
>>> Thanks
>>> Mostafa 
>>> 
>>> 
>>> On Fri, Apr 8, 2016 at 10:04 AM, Ken Farmer <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> First off - congrats on this new release.  It looks like a lot of hard work 
>>> went into it.
>>> 
>>> I'm really excited about dynamic runtime filters, but have a question about 
>>> whether it would apply to the following scenario, which seems oddly missing 
>>> from the documentation.
>>> 
>>> Assume 2 tables:
>>> 1.  fact_ids table with 50 billion rows partitioned by year, month, day, 
>>> sensor_id
>>> 2.  dim_date dimension table with 500 rows, 1 row/day with date_id, and 
>>> date_name, year, month, day columns
>>> 
>>> Query that works fine today:
>>>     SELECT f.col1,
>>>            SUM(somecount)
>>>     FROM fact_ids f
>>>     WHERE f.year=2016 AND f.month=4 AND f.day=2
>>>     GROUP BY 1  ORDER BY 1    
>>> 
>>> 
>>> Query that I hope works with 5.7:
>>>      SELECT f.col1,
>>>             SUM(somecount)
>>>      FROM fact_ids f
>>>           INNER JOIN dim_date d
>>>               ON f.year   = d.year
>>>               AND f.month = d.month
>>>               AND f.day   = d.day
>>>      WHERE d.date_name BETWEEN '2016-03-02' AND '2016-04-01'
>>>      GROUP BY 1  ORDER BY 1
>>> 
>>> Can anyone confirm that 5.7 will perform dynamic runtime filtering on this 
>>> second example above?  
>>> 
>>> Anything else I should be aware of for cases like this?  Or limitations of 
>>> dynamic runtime filtering?
>>> 
>>> 
>>> Thanks!
>>> 
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Impala User" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <mailto:[email protected]>.
>>> 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Impala User" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <mailto:[email protected]>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Impala User" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Impala User" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Impala User" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.

Re: Dynamic Runtime Filter Question

Reply via email to