roadmap.md

maryannxue Sat, 31 Jan 2015 15:08:00 -0800

Author: maryannxue
Date: Sat Jan 31 23:07:21 2015
New Revision: 1656256

URL: http://svn.apache.org/r1656256
Log:
PHOENIX-1554 Update join documentation based on many-to-many support


Modified:
    phoenix/site/publish/joins.html
    phoenix/site/publish/recent.html
    phoenix/site/publish/roadmap.html
    phoenix/site/source/src/site/markdown/joins.md
    phoenix/site/source/src/site/markdown/recent.md
    phoenix/site/source/src/site/markdown/roadmap.md

Modified: phoenix/site/publish/joins.html
URL: 
http://svn.apache.org/viewvc/phoenix/site/publish/joins.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/joins.html (original)
+++ phoenix/site/publish/joins.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 
(http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -411,6 +411,11 @@ ON Items.ItemID = O.ItemID;
  </div> 
 </div> 
 <div class="section"> 
+ <h2 id="Hash_Join_vs._Sort-Merge_Join">Hash Join vs. Sort-Merge Join</h2> 
+ <p>Basic hash join usually outperforms other types of join algorithms, but it 
has its limitations too, the most significant of which is the assumption that 
one of the relations must be small enough to fit into memory. Thus Phoenix now 
has both hash join and sort-merge join implemented to facilitate fast join 
operations as well as join between two large tables.</p> 
+ <p>Phoenix currently uses the hash join algorithm whenever possible since it 
is usually much faster. However we have the hint âUSE_SORT_MERGE_JOINâ for 
forcing the usage of sort-merge join in a query. The choice between these two 
join algorithms, together with detecting the smaller relation for hash join, 
will be done automatically in future under the guidance provided by table 
statistics.</p> 
+</div> 
+<div class="section"> 
  <h2 id="foreign-key-to-primary-key-join-optimization">Foreign Key to Primary 
Key Join Optimization<a 
name="Foreign_Key_to_Primary_Key_Join_Optimization"></a></h2> 
  <p>Oftentimes a join will occur from a child table to a parent table, mapping 
the foreign key of the child table to the primary key of the parent. So instead 
of doing a full scan on the parent table, Phoenix will drive a skip-scan or a 
range-scan based on the foreign key values it got from the child table 
result.</p> 
  <p>Phoenix will extract and sort multiple key parts from the join keys so 
that it can get the most accurate key hints/ranges possible for the parent 
table scan.</p> 
@@ -460,17 +465,17 @@ ON E.Region = P.Region AND E.LocalID = P
    </tr> 
   </tbody> 
  </table> 
- <p>However, there are times when the foreign key values from the child table 
account for a complete primary key space in the parent table, thus using 
skip-scans would only be slower not faster. In order to avoid such situations, 
Phoenix currently does a range-scan by default and only chooses to do a 
skip-scan when there is a child table filter in the WHERE clause or the ON 
clause, as in the above example. Table statistics will come to help making 
smarter choices between the two schemes in future. Yet you can always use hints 
âSKIP_SCAN_HASH_JOINâ or âRANGE_SCAN_HASH_JOINâ to change the default 
behavior.</p> 
+ <p>However, there are times when the foreign key values from the child table 
account for a complete primary key space in the parent table, thus using 
skip-scans would only be slower not faster. Yet you can always turn off the 
optimization by specifying hint âNO_CHILD_PARENT_OPTIMIZATIONâ. 
Furthermore, table statistics will soon come in to help making smarter choices 
between the two schemes.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Configuration">Configuration</h2> 
- <p>The join functionality is now implemented through hash joins, which means 
one side of the join operator has to be small enough to fit into memory in 
order to be broadcast over all servers that have the data of concern from the 
other side of join. This limitation will be eliminated once <a 
class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1179";>PHOENIX-1179</a> is 
implemented.</p> 
- <p>The servers-side caches are used to hold the hashed join-table results. 
The size and the living time of the caches are controlled by the following 
parameters. Note that a join-table can be a physical table, a view, a subquery, 
or a joined result of other join-tables in a multi-join query.</p> 
+ <p>As mentioned earlier, if we decide to use the hash join approach for our 
join queries, the prerequisite is that either of the relations can be small 
enough to fit into memory in order to be broadcast over all servers that have 
the data of concern from the other relation. And aside from making sure that 
the region server heap size is big enough to hold the smaller relation, we 
might also need to pay a attention to a few configuration parameters that are 
crucial to running hash joins.</p> 
+ <p>The servers-side caches are used to hold the hash table built upon the 
smaller relation. The size and the living time of the caches are controlled by 
the following parameters. Note that a relation can be a physical table, a view, 
a subquery, or a joined result of other relations in a multiple-join query.</p> 
  <ol style="list-style-type: decimal"> 
   <li>phoenix.query.maxServerCacheBytes 
    <ul> 
-    <li>Maximum size (in bytes) of a join-table result before compression and 
conversion to a hash map.</li> 
-    <li>Attempting to hash a join-table result of a size bigger than this 
setting will result in a MaxServerCacheSizeExceededException.</li> 
+    <li>Maximum size (in bytes) of the raw results of a relation before being 
compressed and sent over to the region servers.</li> 
+    <li>Attempting to serializing the raw results of a relation with a size 
bigger than this setting will result in a 
MaxServerCacheSizeExceededException.</li> 
     <li><b>Default: 104,857,600</b></li> 
    </ul></li> 
   <li>phoenix.query.maxGlobalMemoryPercentage 
@@ -487,16 +492,16 @@ ON E.Region = P.Region AND E.LocalID = P
    </ul></li> 
  </ol> 
  <p>See our <a href="tuning.html">Configuration and Tuning Guide</a> for more 
details.</p> 
- <p>Although changing parameters can sometimes be a solution to getting rid of 
the exceptions mentioned above, it is highly recommended that you first 
consider optimizing the join queries according to the information provided in 
the following chapter.</p> 
+ <p>Although changing parameters can sometimes be a solution to getting rid of 
the exceptions mentioned above, it is highly recommended that you first 
consider optimizing the join queries according to the information provided in 
the following section.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Optimizing_Your_Query">Optimizing Your Query</h2> 
- <p>As mentioned in the previous chapter, it is most crucial to make sure that 
there will be enough memory for the join query execution. But other than rush 
to change the configuration immediately, sometimes all you need to do is to 
know a bit of the interiors and adjust the sequence of the tables that appear 
in your join query.</p> 
- <p>Below is a description of the default join order (without the presence of 
table statistics) and of which side of the query will be executed as an inner 
query and put into server cache:</p> 
+ <p>Now that we know if using hash join it is most crucial to make sure that 
there will be enough memory for the query execution, but other than rush to 
change the configuration immediately, sometimes all you need to do is to know a 
bit of the interiors and adjust the sequence of the tables that appear in your 
join query.</p> 
+ <p>Below is a description of the default join order (without the presence of 
table statistics) and of which side of the query will be taken as the 
âsmallerâ relation and be put into server cache:</p> 
  <ol style="list-style-type: decimal"> 
-  <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as 
hash map in server cache.</p></li> 
-  <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be 
built as hash map in server cache.</p></li> 
-  <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be 
built as hash map in server cache.</p></li> 
+  <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as 
hash table in server cache.</p></li> 
+  <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be 
built as hash table in server cache.</p></li> 
+  <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be 
built as hash table in server cache.</p></li> 
  </ol> 
  <p>The join order is more complicated with multiple-join queries. You can try 
running âEXPLAIN <i>join_query</i>â to look at the actual execution plan. 
For multiple-inner-join queries, Phoenix applies star-join optimization by 
default, which means the leading (left-hand-side) table will be scanned only 
once joining all right-hand-side tables at the same time. You can turn off this 
optimization by specifying the hint âNO_STAR_JOINâ in your query if the 
overall size of all right-hand-side tables would exceed the memory size 
limit.</p> 
  <p>Letâs take the previous query for example:</p> 
@@ -533,17 +538,16 @@ ON O.ItemID = I.ItemID;
 3. SCAN Items JOIN HASH[1] --&gt; Final Resultset
 </pre> 
  </div> 
- <p>It is also worth mentioning that not the entire dataset of the table 
should be counted into the memory consumption. Instead, only those columns used 
by the query, and of only the records that satisfy the predicates will be built 
into the server hash map.</p> 
+ <p>It is also worth mentioning that not the entire dataset of the table 
should be counted into the memory consumption. Instead, only those columns used 
by the query, and of only the records that satisfy the predicates will be built 
into the server hash table.</p> 
 </div> 
 <div class="section"> 
  <h2 id="Limitations">Limitations</h2> 
- <p>In our Phoenix 3.2 and 4.2 releases, joins have the following 
restrictions:</p> 
+ <p>In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following 
restrictions and improvements to be made:</p> 
  <ol style="list-style-type: decimal"> 
-  <li>FULL OUTER JOIN and CROSS JOIN are not supported.</li> 
-  <li>Equi-joins: Only equality (=) comparison is supported in joining 
conditions (conditions that specify the connecting rules between the two sides 
of the join operator). However there is no restriction on other predicates in 
the ON clause concerning only one side of the join operator.</li> 
-  <li><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1179";>PHOENIX-1179</a>: 
Joins between two large tables that can neither fit into memory.</li> 
+  <li><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1555";>PHOENIX-1555</a>: 
Fallback to many-to-many join if hash join fails due to insufficient 
memory.</li> 
+  <li><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1556";>PHOENIX-1556</a>: 
Base hash join versus many-to-many decision on how many guideposts will be 
traversed for RHS table(s).</li> 
  </ol> 
- <p>Continuous efforts are being made to enhance Phoenix with more complete 
join functionalities. Please refer to our <a href="roadmap.html">Roadmap</a> 
for more information.</p> 
+ <p>Continuous efforts are being made to bring in more performance enhancement 
for join queries based on table statistics. Please refer to our <a 
href="roadmap.html">Roadmap</a> for more information.</p> 
 </div>
                        </div>
                </div>

Modified: phoenix/site/publish/recent.html
URL: 
http://svn.apache.org/viewvc/phoenix/site/publish/recent.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/recent.html (original)
+++ phoenix/site/publish/recent.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 
(http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -137,6 +137,7 @@
  <li><b><a href="update_statistics.html">Statistics Collection</a></b>. 
Collects the statistics for a table to improve query parallelization. 
<b>Available in our 3.2/4.2 release</b></li> 
  <li><b><a href="joins.html">Join Improvements</a></b>. Improve existing hash 
join implementation. 
   <ul> 
+   <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1179";>Many-to-many 
joins</a></b>. Support joins where both sides are too large to fit into memory. 
<b>Available in our 3.3/4.3 release</b></li> 
    <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-852";>Optimize foreign key 
joins</a></b>. Optimize foreign key joins by leveraging our skip scan filter. 
<b>Available in our 3.2/4.2 release</b></li> 
    <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-167";>Semi/anti 
joins</a></b>. Support semi/anti subqueries through the standard [NOT] IN and 
[NOT] EXISTS keywords. <b>Available in our 3.2/4.2 release</b></li> 
   </ul></li> 

Modified: phoenix/site/publish/roadmap.html
URL: 
http://svn.apache.org/viewvc/phoenix/site/publish/roadmap.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/roadmap.html (original)
+++ phoenix/site/publish/roadmap.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
  Rendered using Reflow Maven Skin 1.1.0 
(http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -137,7 +137,7 @@
  <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-400";>Transaction 
Support</a></b>. Support transactions by integrating with an open source 
solution like <a class="externalLink" 
href="https://github.com/continuuity/tephra";>Tephra</a>, <a 
class="externalLink" href="https://github.com/XiaoMi/themis";>Themis</a>, or 
some other similar option.</li> 
  <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1167";>Join 
Improvements</a></b>. Enhance our join capabilities in a variety of ways:<br /> 
   <ul> 
-   <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1179";>Many-to-many 
joins</a></b>. Support joins where both sides are too large to fit into 
memory.</li> 
+   <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1556";>Table-stats-guided 
choice between hash join and sort-merge join</a></b>. Base hash join versus 
many-to-many decision on how many guideposts will be traversed for RHS 
table(s).</li> 
    <li><b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-150";>Inlined parent/child 
joins</a></b>. Optimize parent/child joins by storing child rows inside of a 
parent row, forming the column qualifier through a known prefix plus the child 
row primary key.</li> 
   </ul></li> 
  <li><b><a href="subqueries.html">Subquery</a> Enhancement</b>, which includes 
support for <b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1388";>correlated subqueries 
in the HAVING clause</a></b> and <b><a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-1392";>using subqueries as 
expressions</a></b>.</li> 

Modified: phoenix/site/source/src/site/markdown/joins.md
URL: 
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/joins.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/joins.md (original)
+++ phoenix/site/source/src/site/markdown/joins.md Sat Jan 31 23:07:21 2015
@@ -124,6 +124,12 @@ As an alternative to the [earlier exampl
          GROUP BY ItemID) AS O
     ON Items.ItemID = O.ItemID;
 
+## Hash Join vs. Sort-Merge Join
+
+Basic hash join usually outperforms other types of join algorithms, but it has 
its limitations too, the most significant of which is the assumption that one 
of the relations must be small enough to fit into memory. Thus Phoenix now has 
both hash join and sort-merge join implemented to facilitate fast join 
operations as well as join between two large tables.
+
+Phoenix currently uses the hash join algorithm whenever possible since it is 
usually much faster. However we have the hint "USE_SORT_MERGE_JOIN" for forcing 
the usage of sort-merge join in a query. The choice between these two join 
algorithms, together with detecting the smaller relation for hash join, will be 
done automatically in future under the guidance provided by table statistics.
+
 ## Foreign Key to Primary Key Join Optimization<a 
name="foreign-key-to-primary-key-join-optimization"></a>
 
 Oftentimes a join will occur from a child table to a parent table, mapping the 
foreign key of the child table to the primary key of the parent. So instead of 
doing a full scan on the parent table, Phoenix will drive a skip-scan or a 
range-scan based on the foreign key values it got from the child table result.
@@ -165,17 +171,17 @@ W/O Optimization    |W/ Optimization
 --------------------|---------------
 8.1s                |0.4s
 
-However, there are times when the foreign key values from the child table 
account for a complete primary key space in the parent table, thus using 
skip-scans would only be slower not faster. In order to avoid such situations, 
Phoenix currently does a range-scan by default and only chooses to do a 
skip-scan when there is a child table filter in the WHERE clause or the ON 
clause, as in the above example. Table statistics will come to help making 
smarter choices between the two schemes in future. Yet you can always use hints 
"SKIP_SCAN_HASH_JOIN" or "RANGE_SCAN_HASH_JOIN" to change the default behavior.
+However, there are times when the foreign key values from the child table 
account for a complete primary key space in the parent table, thus using 
skip-scans would only be slower not faster. Yet you can always turn off the 
optimization by specifying hint "NO_CHILD_PARENT_OPTIMIZATION". Furthermore, 
table statistics will soon come in to help making smarter choices between the 
two schemes.
 
 ## Configuration
 
-The join functionality is now implemented through hash joins, which means one 
side of the join operator has to be small enough to fit into memory in order to 
be broadcast over all servers that have the data of concern from the other side 
of join. This limitation will be eliminated once 
[PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179) is 
implemented.
+As mentioned earlier, if we decide to use the hash join approach for our join 
queries, the prerequisite is that either of the relations can be small enough 
to fit into memory in order to be broadcast over all servers that have the data 
of concern from the other relation. And aside from making sure that the region 
server heap size is big enough to hold the smaller relation, we might also need 
to pay a attention to a few configuration parameters that are crucial to 
running hash joins.
 
-The servers-side caches are used to hold the hashed join-table results. The 
size and the living time of the caches are controlled by the following 
parameters. Note that a join-table can be a physical table, a view, a subquery, 
or a joined result of other join-tables in a multi-join query.
+The servers-side caches are used to hold the hash table built upon the smaller 
relation. The size and the living time of the caches are controlled by the 
following parameters. Note that a relation can be a physical table, a view, a 
subquery, or a joined result of other relations in a multiple-join query.
 
 1. phoenix.query.maxServerCacheBytes
-    * Maximum size (in bytes) of a join-table result before compression and 
conversion to a hash map.
-    * Attempting to hash a join-table result of a size bigger than this 
setting will result in a MaxServerCacheSizeExceededException.
+    * Maximum size (in bytes) of the raw results of a relation before being 
compressed and sent over to the region servers.
+    * Attempting to serializing the raw results of a relation with a size 
bigger than this setting will result in a MaxServerCacheSizeExceededException.
     * **Default: 104,857,600**
 2. phoenix.query.maxGlobalMemoryPercentage
     * Percentage of total heap memory (i.e. Runtime.getRuntime().maxMemory()) 
that all threads may use.
@@ -188,25 +194,25 @@ The servers-side caches are used to hold
 
 See our [Configuration and Tuning Guide](tuning.html) for more details.
 
-Although changing parameters can sometimes be a solution to getting rid of the 
exceptions mentioned above, it is highly recommended that you first consider 
optimizing the join queries according to the information provided in the 
following chapter.
+Although changing parameters can sometimes be a solution to getting rid of the 
exceptions mentioned above, it is highly recommended that you first consider 
optimizing the join queries according to the information provided in the 
following section.
 
 ## Optimizing Your Query
 
-As mentioned in the previous chapter, it is most crucial to make sure that 
there will be enough memory for the join query execution. But other than rush 
to change the configuration immediately, sometimes all you need to do is to 
know a bit of the interiors and adjust the sequence of the tables that appear 
in your join query.
+Now that we know if using hash join it is most crucial to make sure that there 
will be enough memory for the query execution, but other than rush to change 
the configuration immediately, sometimes all you need to do is to know a bit of 
the interiors and adjust the sequence of the tables that appear in your join 
query.
 
-Below is a description of the default join order (without the presence of 
table statistics) and of which side of the query will be executed as an inner 
query and put into server cache:
+Below is a description of the default join order (without the presence of 
table statistics) and of which side of the query will be taken as the "smaller" 
relation and be put into server cache:
 
 1. _lhs_ INNER JOIN _rhs_
 
-    _rhs_ will be built as hash map in server cache.
+    _rhs_ will be built as hash table in server cache.
 
 2. _lhs_ LEFT OUTER JOIN _rhs_
 
-    _rhs_ will be built as hash map in server cache.
+    _rhs_ will be built as hash table in server cache.
 
 3. _lhs_ RIGHT OUTER JOIN _rhs_
 
-    _lhs_ will be built as hash map in server cache.
+    _lhs_ will be built as hash table in server cache.
 
 The join order is more complicated with multiple-join queries. You can try 
running "EXPLAIN _join\_query_" to look at the actual execution plan. For 
multiple-inner-join queries, Phoenix applies star-join optimization by default, 
which means the leading (left-hand-side) table will be scanned only once 
joining all right-hand-side tables at the same time. You can turn off this 
optimization by specifying the hint "NO_STAR_JOIN" in your query if the overall 
size of all right-hand-side tables would exceed the memory size limit.
 
@@ -240,15 +246,14 @@ The join order will be:
     2. SCAN Orders JOIN HASH[0]; CLOSE HASH[0] --> BUILD HASH[1]
     3. SCAN Items JOIN HASH[1] --> Final Resultset
 
-It is also worth mentioning that not the entire dataset of the table should be 
counted into the memory consumption. Instead, only those columns used by the 
query, and of only the records that satisfy the predicates will be built into 
the server hash map.
+It is also worth mentioning that not the entire dataset of the table should be 
counted into the memory consumption. Instead, only those columns used by the 
query, and of only the records that satisfy the predicates will be built into 
the server hash table.
 
 ## Limitations
 
-In our Phoenix 3.2 and 4.2 releases, joins have the following restrictions:
+In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following restrictions 
and improvements to be made:
 
-1. FULL OUTER JOIN and CROSS JOIN are not supported.
-2. Equi-joins: Only equality (=) comparison is supported in joining conditions 
(conditions that specify the connecting rules between the two sides of the join 
operator). However there is no restriction on other predicates in the ON clause 
concerning only one side of the join operator.
-3. [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179): Joins 
between two large tables that can neither fit into memory.
+1. [PHOENIX-1555](https://issues.apache.org/jira/browse/PHOENIX-1555): 
Fallback to many-to-many join if hash join fails due to insufficient memory.
+2. [PHOENIX-1556](https://issues.apache.org/jira/browse/PHOENIX-1556): Base 
hash join versus many-to-many decision on how many guideposts will be traversed 
for RHS table(s).
 
-Continuous efforts are being made to enhance Phoenix with more complete join 
functionalities. Please refer to our [Roadmap](roadmap.html) for more 
information.
+Continuous efforts are being made to bring in more performance enhancement for 
join queries based on table statistics. Please refer to our 
[Roadmap](roadmap.html) for more information.
 

Modified: phoenix/site/source/src/site/markdown/recent.md
URL: 
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/recent.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/recent.md (original)
+++ phoenix/site/source/src/site/markdown/recent.md Sat Jan 31 23:07:21 2015
@@ -4,6 +4,7 @@ As items are implemented from our road m
 
 1. **[Statistics Collection](update_statistics.html)**. Collects the 
statistics for a table to improve query parallelization. **Available in our 
3.2/4.2 release**
 2. **[Join Improvements](joins.html)**. Improve existing hash join 
implementation.
+    * **[Many-to-many 
joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins 
where both sides are too large to fit into memory. **Available in our 3.3/4.3 
release**
     * **[Optimize foreign key 
joins](https://issues.apache.org/jira/browse/PHOENIX-852)**. Optimize foreign 
key joins by leveraging our skip scan filter. **Available in our 3.2/4.2 
release**
     * **[Semi/anti 
joins](https://issues.apache.org/jira/browse/PHOENIX-167)**. Support semi/anti 
subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. **Available 
in our 3.2/4.2 release**
 3. **[Subqueries](subqueries.html)** Support independent subqueries and 
correlated subqueries in the WHERE clause as well as subqueries in the FROM 
clause. **Available in our 3.2/4.2 release**

Modified: phoenix/site/source/src/site/markdown/roadmap.md
URL: 
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/roadmap.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/roadmap.md (original)
+++ phoenix/site/source/src/site/markdown/roadmap.md Sat Jan 31 23:07:21 2015
@@ -4,7 +4,7 @@ Our roadmap is driven by our user commun
 
 1. **[Transaction 
Support](https://issues.apache.org/jira/browse/PHOENIX-400)**. Support 
transactions by integrating with an open source solution like 
[Tephra](https://github.com/continuuity/tephra), 
[Themis](https://github.com/XiaoMi/themis), or some other similar option.
 1. **[Join 
Improvements](https://issues.apache.org/jira/browse/PHOENIX-1167)**. Enhance 
our join capabilities in a variety of ways:<br/>
-    *  **[Many-to-many 
joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins 
where both sides are too large to fit into memory. 
+    *  **[Table-stats-guided choice between hash join and sort-merge 
join](https://issues.apache.org/jira/browse/PHOENIX-1556)**. Base hash join 
versus many-to-many decision on how many guideposts will be traversed for RHS 
table(s).
     *  **[Inlined parent/child 
joins](https://issues.apache.org/jira/browse/PHOENIX-150)**. Optimize 
parent/child joins by storing child rows inside of a parent row, forming the 
column qualifier through a known prefix plus the child row primary key.
 2. **[Subquery](subqueries.html) Enhancement**, which includes support for 
**[correlated subqueries in the HAVING 
clause](https://issues.apache.org/jira/browse/PHOENIX-1388)** and **[using 
subqueries as 
expressions](https://issues.apache.org/jira/browse/PHOENIX-1392)**.
 15. **[Cost-based Query 
Optimization]((https://issues.apache.org/jira/browse/PHOENIX-1177))**. Enhance 
existing [statistics collection](update_statistics.html) by enabling further 
query optmizations based on the size and cardinality of the data.

svn commit: r1656256 - in /phoenix/site: publish/joins.html publish/recent.html publish/roadmap.html source/src/site/markdown/joins.md source/src/site/markdown/recent.md source/src/site/markdown/roadmap.md

Reply via email to