[ 
https://issues.apache.org/jira/browse/PHOENIX-6791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17678841#comment-17678841
 ] 

Kadir Ozdemir commented on PHOENIX-6791:
----------------------------------------

[~comnetwork], If you were to just express the above where clause only with 
scan ranges then you would get a very large number of key slots, i.e., key 
range explosion. The right approach is to detect this efficiently (before 
letting the key explosion happens) and limit the number of PK columns to be 
used for key ranges. The current implementation tries to do this but the 
implementation is very complex, not efficient and still cannot handle all where 
clauses correctly especially when the where clause is degenerate. I will post 
the design doc on how to do this correctly and efficiently.

> WHERE optimizer redesign
> ------------------------
>
>                 Key: PHOENIX-6791
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-6791
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: Kadir Ozdemir
>            Priority: Major
>             Fix For: 5.3.0
>
>
> The WHERE optimizer in Phoenix derives the information about which row key 
> ranges to be scanned from the primary key (PK) column expressions in a where 
> clause. These key ranges are then used to determine the table regions to scan 
> and generate a SkipScanFilter for each of these scans if applicable. 
> The WHERE expression may include non-PK column (sub) expressions. After 
> identifying the key ranges, the WHERE optimizer removes the nodes for PK 
> columns from the expression tree if these nodes are fully used to determine 
> the key ranges.
> Since the values in the WHERE expression are expressed by byte arrays, the 
> key ranges are also expressed using byte arrays. KeyRange represents a range 
> for a row key or any sub part of a row key key. A key range is composed of 
> two pairs, one for each end of the range, lower and upper. The pair is formed 
> from a byte array and a boolean value. The boolean value indicates if the end 
> of the range specified by the byte array is inclusive or not. If the byte 
> array is empty, it means that the corresponding end of the range is 
> unbounded. 
> KeySlot represents a key part and the list of key ranges for this key part 
> where a key part can be any sub part of a PK, including leading, trailing, or 
> middle part of the key. The number of columns in a key part is called span. 
> For the terminal nodes (i..e, constant values) in the expression tree, 
> KeySlot objects are created with a single key range. When KeySlot objects are 
> rolled up in the expression tree, they can have multiple ranges. For example, 
> a KeySlot object representing an IN expression will have a separate range for 
> each member of the IN expression. Similarly the KeySlot object for an OR 
> expression can have multiple ranges similarly. Please note an IN operator can 
> be replaced by an equivalent OR expression. 
> When the WHERE optimizer visits the nodes of the expression tree, it 
> generates a KeySlots object. KeySlots is essentially a list of KeySlot 
> objects (please note the difference between KeySlots vs KeySlot). There are 
> two types of KeySlots: SingleKeySlot and MultiKeySlot. SingleKeySlot 
> represents a single key slot whereas MultiKeySlot is a list of key slots the 
> results of AND expression on SingleKeySlot or MultiKeySlot objects. 
> The key slots are rolled into a MultiKeySlot object when processing an AND 
> expression. The AND operation on two key slots starting their spans with the 
> same PK columns is equivalent to taking intersection of their ranges. The OR 
> operation implementation is limited and rather simple compared to the AND 
> operation. The OR operation attempts to coalesce key slots if all of the key 
> slots have the same starting PK column. If not, it generates a null KeySlots. 
> When an expression node is used fully in generating a key slot, this 
> expression node is removed from the expression tree.
> A row key for a given table can be composed of several PK columns. Without 
> any restrictions imposed by predefined rules, intersection of key slots can 
> lead to a large number of key slots, i.e., key ranges.  For example, consider 
> a row key composed of three integer columns, PK1, PK2, and PK3, and the 
> expression (PK1,  PK2) > (100, 25) AND PK3 = 5. The result would be a very 
> large number of key slots and each key slot represents a point in the three 
> dimensional space, including (100, 26, 5), (100, 27, 5), …, (100, 2147483647, 
> 5), (101, 1, 5), (101, 2, 5), … .
> A simple expression (like the one given above) with a relatively small number 
> of PK columns and a simple data type, e.g., integer, is sufficient to show 
> that finding key ranges for an arbitrary expression is an intractable 
> problem. Attempting to optimize the queries by enumerating the key ranges can 
> lead to excessive memory allocation and long computation times and the 
> optimization can defeat its purpose. 
> The current implementation attempts to enumerate all possible key ranges in 
> general. Because of this, the WHERE optimizer has caused out of memory 
> issues, and query timeouts due to high CPU usage. The very recent bug fixes 
> attempts to catch these cases and prevent them. However, these fixes do not 
> attempt to cover all cases and are formulated based on known cases.
> In addition to inefficient resource utilization, there are known types of 
> expressions, the current implementation still returns wrong results for them. 
>  For example, please see PHOENIX-6669 where if degenerate queries are caused 
> by some conditions on non-leading PK columns, then Phoenix cannot catch this 
> and can return wrong results.
> An example to show inconsistencies in the implementation is as follows. An 
> RVC expression can be converted to an equivalent AND/OR expression. For 
> example, (PK1, PK2) > (A, B) is equivalent to (PK1 > A) OR (PK1 = A AND PK2 > 
> B). The implementation converts the first expression into a single key range 
> and thus a scan with the start and stop rows keys without a filter. However, 
> the implementation cannot do the same for the second expression and instead 
> it generates a scan with a filter for the expression without generating a key 
> range.
> Due to tens of possibly conflicting bug fixes over the years and not having a 
> document that clearly describes the design, the current implementation has 
> become hard to understand and maintain. 
> The WHERE optimizer redesign will be formulated based on the following 
> observations:
>  # As described in the previous section, attempting to enumerate the PK 
> ranges over arbitrary expression is an intractable problem due to key range 
> explosion. Since identifying key ranges is just for the optimization but not 
> for the correctness of queries, the cost of optimization should justify the 
> gain from the optimization.
>  # The optimization gain comes from first skipping table regions and then 
> skipping rows within table regions. In practice, the most gain comes from the 
> most significant leading PK columns. The optimization is not useful if the 
> first leading PK column is not included in a WHERE expression. The value of 
> the optimization decreases with the subsequent PK columns. 
> The objectives of the redesign are as follows:
>  # The space and time complexity of the WHERE optimizer should not be more 
> than O(N2).
>  # The redesign should be provably correct. This requires constructing a 
> mathematical system with well defined elements and operations. 
>  # The redesign should generate the same result for the expressions that are 
> logically equivalent. 
>  # The redesign should lead to significantly simpler implementation. This can 
> be achieved using well defined and clearly separated operations and concepts.
>  # The scope of the redesign will be limited to the WHERE optimizer and so 
> the changes will mostly be limited to where optimizer and Expression classes. 
> For example, this redesign does not attempt to change the skip scan filter 
> design or the WHERE compiler. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to