Author: maryannxue
Date: Sat Jan 31 23:07:21 2015
New Revision: 1656256
URL: http://svn.apache.org/r1656256
Log:
PHOENIX-1554 Update join documentation based on many-to-many support
Modified:
phoenix/site/publish/joins.html
phoenix/site/publish/recent.html
phoenix/site/publish/roadmap.html
phoenix/site/source/src/site/markdown/joins.md
phoenix/site/source/src/site/markdown/recent.md
phoenix/site/source/src/site/markdown/roadmap.md
Modified: phoenix/site/publish/joins.html
URL:
http://svn.apache.org/viewvc/phoenix/site/publish/joins.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/joins.html (original)
+++ phoenix/site/publish/joins.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
Rendered using Reflow Maven Skin 1.1.0
(http://andriusvelykis.github.io/reflow-maven-skin)
-->
<html xml:lang="en" lang="en">
@@ -411,6 +411,11 @@ ON Items.ItemID = O.ItemID;
</div>
</div>
<div class="section">
+ <h2 id="Hash_Join_vs._Sort-Merge_Join">Hash Join vs. Sort-Merge Join</h2>
+ <p>Basic hash join usually outperforms other types of join algorithms, but it
has its limitations too, the most significant of which is the assumption that
one of the relations must be small enough to fit into memory. Thus Phoenix now
has both hash join and sort-merge join implemented to facilitate fast join
operations as well as join between two large tables.</p>
+ <p>Phoenix currently uses the hash join algorithm whenever possible since it
is usually much faster. However we have the hint âUSE_SORT_MERGE_JOINâ for
forcing the usage of sort-merge join in a query. The choice between these two
join algorithms, together with detecting the smaller relation for hash join,
will be done automatically in future under the guidance provided by table
statistics.</p>
+</div>
+<div class="section">
<h2 id="foreign-key-to-primary-key-join-optimization">Foreign Key to Primary
Key Join Optimization<a
name="Foreign_Key_to_Primary_Key_Join_Optimization"></a></h2>
<p>Oftentimes a join will occur from a child table to a parent table, mapping
the foreign key of the child table to the primary key of the parent. So instead
of doing a full scan on the parent table, Phoenix will drive a skip-scan or a
range-scan based on the foreign key values it got from the child table
result.</p>
<p>Phoenix will extract and sort multiple key parts from the join keys so
that it can get the most accurate key hints/ranges possible for the parent
table scan.</p>
@@ -460,17 +465,17 @@ ON E.Region = P.Region AND E.LocalID = P
</tr>
</tbody>
</table>
- <p>However, there are times when the foreign key values from the child table
account for a complete primary key space in the parent table, thus using
skip-scans would only be slower not faster. In order to avoid such situations,
Phoenix currently does a range-scan by default and only chooses to do a
skip-scan when there is a child table filter in the WHERE clause or the ON
clause, as in the above example. Table statistics will come to help making
smarter choices between the two schemes in future. Yet you can always use hints
âSKIP_SCAN_HASH_JOINâ or âRANGE_SCAN_HASH_JOINâ to change the default
behavior.</p>
+ <p>However, there are times when the foreign key values from the child table
account for a complete primary key space in the parent table, thus using
skip-scans would only be slower not faster. Yet you can always turn off the
optimization by specifying hint âNO_CHILD_PARENT_OPTIMIZATIONâ.
Furthermore, table statistics will soon come in to help making smarter choices
between the two schemes.</p>
</div>
<div class="section">
<h2 id="Configuration">Configuration</h2>
- <p>The join functionality is now implemented through hash joins, which means
one side of the join operator has to be small enough to fit into memory in
order to be broadcast over all servers that have the data of concern from the
other side of join. This limitation will be eliminated once <a
class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a> is
implemented.</p>
- <p>The servers-side caches are used to hold the hashed join-table results.
The size and the living time of the caches are controlled by the following
parameters. Note that a join-table can be a physical table, a view, a subquery,
or a joined result of other join-tables in a multi-join query.</p>
+ <p>As mentioned earlier, if we decide to use the hash join approach for our
join queries, the prerequisite is that either of the relations can be small
enough to fit into memory in order to be broadcast over all servers that have
the data of concern from the other relation. And aside from making sure that
the region server heap size is big enough to hold the smaller relation, we
might also need to pay a attention to a few configuration parameters that are
crucial to running hash joins.</p>
+ <p>The servers-side caches are used to hold the hash table built upon the
smaller relation. The size and the living time of the caches are controlled by
the following parameters. Note that a relation can be a physical table, a view,
a subquery, or a joined result of other relations in a multiple-join query.</p>
<ol style="list-style-type: decimal">
<li>phoenix.query.maxServerCacheBytes
<ul>
- <li>Maximum size (in bytes) of a join-table result before compression and
conversion to a hash map.</li>
- <li>Attempting to hash a join-table result of a size bigger than this
setting will result in a MaxServerCacheSizeExceededException.</li>
+ <li>Maximum size (in bytes) of the raw results of a relation before being
compressed and sent over to the region servers.</li>
+ <li>Attempting to serializing the raw results of a relation with a size
bigger than this setting will result in a
MaxServerCacheSizeExceededException.</li>
<li><b>Default: 104,857,600</b></li>
</ul></li>
<li>phoenix.query.maxGlobalMemoryPercentage
@@ -487,16 +492,16 @@ ON E.Region = P.Region AND E.LocalID = P
</ul></li>
</ol>
<p>See our <a href="tuning.html">Configuration and Tuning Guide</a> for more
details.</p>
- <p>Although changing parameters can sometimes be a solution to getting rid of
the exceptions mentioned above, it is highly recommended that you first
consider optimizing the join queries according to the information provided in
the following chapter.</p>
+ <p>Although changing parameters can sometimes be a solution to getting rid of
the exceptions mentioned above, it is highly recommended that you first
consider optimizing the join queries according to the information provided in
the following section.</p>
</div>
<div class="section">
<h2 id="Optimizing_Your_Query">Optimizing Your Query</h2>
- <p>As mentioned in the previous chapter, it is most crucial to make sure that
there will be enough memory for the join query execution. But other than rush
to change the configuration immediately, sometimes all you need to do is to
know a bit of the interiors and adjust the sequence of the tables that appear
in your join query.</p>
- <p>Below is a description of the default join order (without the presence of
table statistics) and of which side of the query will be executed as an inner
query and put into server cache:</p>
+ <p>Now that we know if using hash join it is most crucial to make sure that
there will be enough memory for the query execution, but other than rush to
change the configuration immediately, sometimes all you need to do is to know a
bit of the interiors and adjust the sequence of the tables that appear in your
join query.</p>
+ <p>Below is a description of the default join order (without the presence of
table statistics) and of which side of the query will be taken as the
âsmallerâ relation and be put into server cache:</p>
<ol style="list-style-type: decimal">
- <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as
hash map in server cache.</p></li>
- <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be
built as hash map in server cache.</p></li>
- <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be
built as hash map in server cache.</p></li>
+ <li> <p><i>lhs</i> INNER JOIN <i>rhs</i></p> <p><i>rhs</i> will be built as
hash table in server cache.</p></li>
+ <li> <p><i>lhs</i> LEFT OUTER JOIN <i>rhs</i></p> <p><i>rhs</i> will be
built as hash table in server cache.</p></li>
+ <li> <p><i>lhs</i> RIGHT OUTER JOIN <i>rhs</i></p> <p><i>lhs</i> will be
built as hash table in server cache.</p></li>
</ol>
<p>The join order is more complicated with multiple-join queries. You can try
running âEXPLAIN <i>join_query</i>â to look at the actual execution plan.
For multiple-inner-join queries, Phoenix applies star-join optimization by
default, which means the leading (left-hand-side) table will be scanned only
once joining all right-hand-side tables at the same time. You can turn off this
optimization by specifying the hint âNO_STAR_JOINâ in your query if the
overall size of all right-hand-side tables would exceed the memory size
limit.</p>
<p>Letâs take the previous query for example:</p>
@@ -533,17 +538,16 @@ ON O.ItemID = I.ItemID;
3. SCAN Items JOIN HASH[1] --> Final Resultset
</pre>
</div>
- <p>It is also worth mentioning that not the entire dataset of the table
should be counted into the memory consumption. Instead, only those columns used
by the query, and of only the records that satisfy the predicates will be built
into the server hash map.</p>
+ <p>It is also worth mentioning that not the entire dataset of the table
should be counted into the memory consumption. Instead, only those columns used
by the query, and of only the records that satisfy the predicates will be built
into the server hash table.</p>
</div>
<div class="section">
<h2 id="Limitations">Limitations</h2>
- <p>In our Phoenix 3.2 and 4.2 releases, joins have the following
restrictions:</p>
+ <p>In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following
restrictions and improvements to be made:</p>
<ol style="list-style-type: decimal">
- <li>FULL OUTER JOIN and CROSS JOIN are not supported.</li>
- <li>Equi-joins: Only equality (=) comparison is supported in joining
conditions (conditions that specify the connecting rules between the two sides
of the join operator). However there is no restriction on other predicates in
the ON clause concerning only one side of the join operator.</li>
- <li><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1179">PHOENIX-1179</a>:
Joins between two large tables that can neither fit into memory.</li>
+ <li><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1555">PHOENIX-1555</a>:
Fallback to many-to-many join if hash join fails due to insufficient
memory.</li>
+ <li><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1556">PHOENIX-1556</a>:
Base hash join versus many-to-many decision on how many guideposts will be
traversed for RHS table(s).</li>
</ol>
- <p>Continuous efforts are being made to enhance Phoenix with more complete
join functionalities. Please refer to our <a href="roadmap.html">Roadmap</a>
for more information.</p>
+ <p>Continuous efforts are being made to bring in more performance enhancement
for join queries based on table statistics. Please refer to our <a
href="roadmap.html">Roadmap</a> for more information.</p>
</div>
</div>
</div>
Modified: phoenix/site/publish/recent.html
URL:
http://svn.apache.org/viewvc/phoenix/site/publish/recent.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/recent.html (original)
+++ phoenix/site/publish/recent.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
Rendered using Reflow Maven Skin 1.1.0
(http://andriusvelykis.github.io/reflow-maven-skin)
-->
<html xml:lang="en" lang="en">
@@ -137,6 +137,7 @@
<li><b><a href="update_statistics.html">Statistics Collection</a></b>.
Collects the statistics for a table to improve query parallelization.
<b>Available in our 3.2/4.2 release</b></li>
<li><b><a href="joins.html">Join Improvements</a></b>. Improve existing hash
join implementation.
<ul>
+ <li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1179">Many-to-many
joins</a></b>. Support joins where both sides are too large to fit into memory.
<b>Available in our 3.3/4.3 release</b></li>
<li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-852">Optimize foreign key
joins</a></b>. Optimize foreign key joins by leveraging our skip scan filter.
<b>Available in our 3.2/4.2 release</b></li>
<li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-167">Semi/anti
joins</a></b>. Support semi/anti subqueries through the standard [NOT] IN and
[NOT] EXISTS keywords. <b>Available in our 3.2/4.2 release</b></li>
</ul></li>
Modified: phoenix/site/publish/roadmap.html
URL:
http://svn.apache.org/viewvc/phoenix/site/publish/roadmap.html?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/publish/roadmap.html (original)
+++ phoenix/site/publish/roadmap.html Sat Jan 31 23:07:21 2015
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<!--
- Generated by Apache Maven Doxia at 2015-01-27
+ Generated by Apache Maven Doxia at 2015-01-31
Rendered using Reflow Maven Skin 1.1.0
(http://andriusvelykis.github.io/reflow-maven-skin)
-->
<html xml:lang="en" lang="en">
@@ -137,7 +137,7 @@
<li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-400">Transaction
Support</a></b>. Support transactions by integrating with an open source
solution like <a class="externalLink"
href="https://github.com/continuuity/tephra">Tephra</a>, <a
class="externalLink" href="https://github.com/XiaoMi/themis">Themis</a>, or
some other similar option.</li>
<li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1167">Join
Improvements</a></b>. Enhance our join capabilities in a variety of ways:<br />
<ul>
- <li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1179">Many-to-many
joins</a></b>. Support joins where both sides are too large to fit into
memory.</li>
+ <li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1556">Table-stats-guided
choice between hash join and sort-merge join</a></b>. Base hash join versus
many-to-many decision on how many guideposts will be traversed for RHS
table(s).</li>
<li><b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-150">Inlined parent/child
joins</a></b>. Optimize parent/child joins by storing child rows inside of a
parent row, forming the column qualifier through a known prefix plus the child
row primary key.</li>
</ul></li>
<li><b><a href="subqueries.html">Subquery</a> Enhancement</b>, which includes
support for <b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1388">correlated subqueries
in the HAVING clause</a></b> and <b><a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-1392">using subqueries as
expressions</a></b>.</li>
Modified: phoenix/site/source/src/site/markdown/joins.md
URL:
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/joins.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/joins.md (original)
+++ phoenix/site/source/src/site/markdown/joins.md Sat Jan 31 23:07:21 2015
@@ -124,6 +124,12 @@ As an alternative to the [earlier exampl
GROUP BY ItemID) AS O
ON Items.ItemID = O.ItemID;
+## Hash Join vs. Sort-Merge Join
+
+Basic hash join usually outperforms other types of join algorithms, but it has
its limitations too, the most significant of which is the assumption that one
of the relations must be small enough to fit into memory. Thus Phoenix now has
both hash join and sort-merge join implemented to facilitate fast join
operations as well as join between two large tables.
+
+Phoenix currently uses the hash join algorithm whenever possible since it is
usually much faster. However we have the hint "USE_SORT_MERGE_JOIN" for forcing
the usage of sort-merge join in a query. The choice between these two join
algorithms, together with detecting the smaller relation for hash join, will be
done automatically in future under the guidance provided by table statistics.
+
## Foreign Key to Primary Key Join Optimization<a
name="foreign-key-to-primary-key-join-optimization"></a>
Oftentimes a join will occur from a child table to a parent table, mapping the
foreign key of the child table to the primary key of the parent. So instead of
doing a full scan on the parent table, Phoenix will drive a skip-scan or a
range-scan based on the foreign key values it got from the child table result.
@@ -165,17 +171,17 @@ W/O Optimization |W/ Optimization
--------------------|---------------
8.1s |0.4s
-However, there are times when the foreign key values from the child table
account for a complete primary key space in the parent table, thus using
skip-scans would only be slower not faster. In order to avoid such situations,
Phoenix currently does a range-scan by default and only chooses to do a
skip-scan when there is a child table filter in the WHERE clause or the ON
clause, as in the above example. Table statistics will come to help making
smarter choices between the two schemes in future. Yet you can always use hints
"SKIP_SCAN_HASH_JOIN" or "RANGE_SCAN_HASH_JOIN" to change the default behavior.
+However, there are times when the foreign key values from the child table
account for a complete primary key space in the parent table, thus using
skip-scans would only be slower not faster. Yet you can always turn off the
optimization by specifying hint "NO_CHILD_PARENT_OPTIMIZATION". Furthermore,
table statistics will soon come in to help making smarter choices between the
two schemes.
## Configuration
-The join functionality is now implemented through hash joins, which means one
side of the join operator has to be small enough to fit into memory in order to
be broadcast over all servers that have the data of concern from the other side
of join. This limitation will be eliminated once
[PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179) is
implemented.
+As mentioned earlier, if we decide to use the hash join approach for our join
queries, the prerequisite is that either of the relations can be small enough
to fit into memory in order to be broadcast over all servers that have the data
of concern from the other relation. And aside from making sure that the region
server heap size is big enough to hold the smaller relation, we might also need
to pay a attention to a few configuration parameters that are crucial to
running hash joins.
-The servers-side caches are used to hold the hashed join-table results. The
size and the living time of the caches are controlled by the following
parameters. Note that a join-table can be a physical table, a view, a subquery,
or a joined result of other join-tables in a multi-join query.
+The servers-side caches are used to hold the hash table built upon the smaller
relation. The size and the living time of the caches are controlled by the
following parameters. Note that a relation can be a physical table, a view, a
subquery, or a joined result of other relations in a multiple-join query.
1. phoenix.query.maxServerCacheBytes
- * Maximum size (in bytes) of a join-table result before compression and
conversion to a hash map.
- * Attempting to hash a join-table result of a size bigger than this
setting will result in a MaxServerCacheSizeExceededException.
+ * Maximum size (in bytes) of the raw results of a relation before being
compressed and sent over to the region servers.
+ * Attempting to serializing the raw results of a relation with a size
bigger than this setting will result in a MaxServerCacheSizeExceededException.
* **Default: 104,857,600**
2. phoenix.query.maxGlobalMemoryPercentage
* Percentage of total heap memory (i.e. Runtime.getRuntime().maxMemory())
that all threads may use.
@@ -188,25 +194,25 @@ The servers-side caches are used to hold
See our [Configuration and Tuning Guide](tuning.html) for more details.
-Although changing parameters can sometimes be a solution to getting rid of the
exceptions mentioned above, it is highly recommended that you first consider
optimizing the join queries according to the information provided in the
following chapter.
+Although changing parameters can sometimes be a solution to getting rid of the
exceptions mentioned above, it is highly recommended that you first consider
optimizing the join queries according to the information provided in the
following section.
## Optimizing Your Query
-As mentioned in the previous chapter, it is most crucial to make sure that
there will be enough memory for the join query execution. But other than rush
to change the configuration immediately, sometimes all you need to do is to
know a bit of the interiors and adjust the sequence of the tables that appear
in your join query.
+Now that we know if using hash join it is most crucial to make sure that there
will be enough memory for the query execution, but other than rush to change
the configuration immediately, sometimes all you need to do is to know a bit of
the interiors and adjust the sequence of the tables that appear in your join
query.
-Below is a description of the default join order (without the presence of
table statistics) and of which side of the query will be executed as an inner
query and put into server cache:
+Below is a description of the default join order (without the presence of
table statistics) and of which side of the query will be taken as the "smaller"
relation and be put into server cache:
1. _lhs_ INNER JOIN _rhs_
- _rhs_ will be built as hash map in server cache.
+ _rhs_ will be built as hash table in server cache.
2. _lhs_ LEFT OUTER JOIN _rhs_
- _rhs_ will be built as hash map in server cache.
+ _rhs_ will be built as hash table in server cache.
3. _lhs_ RIGHT OUTER JOIN _rhs_
- _lhs_ will be built as hash map in server cache.
+ _lhs_ will be built as hash table in server cache.
The join order is more complicated with multiple-join queries. You can try
running "EXPLAIN _join\_query_" to look at the actual execution plan. For
multiple-inner-join queries, Phoenix applies star-join optimization by default,
which means the leading (left-hand-side) table will be scanned only once
joining all right-hand-side tables at the same time. You can turn off this
optimization by specifying the hint "NO_STAR_JOIN" in your query if the overall
size of all right-hand-side tables would exceed the memory size limit.
@@ -240,15 +246,14 @@ The join order will be:
2. SCAN Orders JOIN HASH[0]; CLOSE HASH[0] --> BUILD HASH[1]
3. SCAN Items JOIN HASH[1] --> Final Resultset
-It is also worth mentioning that not the entire dataset of the table should be
counted into the memory consumption. Instead, only those columns used by the
query, and of only the records that satisfy the predicates will be built into
the server hash map.
+It is also worth mentioning that not the entire dataset of the table should be
counted into the memory consumption. Instead, only those columns used by the
query, and of only the records that satisfy the predicates will be built into
the server hash table.
## Limitations
-In our Phoenix 3.2 and 4.2 releases, joins have the following restrictions:
+In our Phoenix 3.3.0 and 4.3.0 releases, joins have the following restrictions
and improvements to be made:
-1. FULL OUTER JOIN and CROSS JOIN are not supported.
-2. Equi-joins: Only equality (=) comparison is supported in joining conditions
(conditions that specify the connecting rules between the two sides of the join
operator). However there is no restriction on other predicates in the ON clause
concerning only one side of the join operator.
-3. [PHOENIX-1179](https://issues.apache.org/jira/browse/PHOENIX-1179): Joins
between two large tables that can neither fit into memory.
+1. [PHOENIX-1555](https://issues.apache.org/jira/browse/PHOENIX-1555):
Fallback to many-to-many join if hash join fails due to insufficient memory.
+2. [PHOENIX-1556](https://issues.apache.org/jira/browse/PHOENIX-1556): Base
hash join versus many-to-many decision on how many guideposts will be traversed
for RHS table(s).
-Continuous efforts are being made to enhance Phoenix with more complete join
functionalities. Please refer to our [Roadmap](roadmap.html) for more
information.
+Continuous efforts are being made to bring in more performance enhancement for
join queries based on table statistics. Please refer to our
[Roadmap](roadmap.html) for more information.
Modified: phoenix/site/source/src/site/markdown/recent.md
URL:
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/recent.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/recent.md (original)
+++ phoenix/site/source/src/site/markdown/recent.md Sat Jan 31 23:07:21 2015
@@ -4,6 +4,7 @@ As items are implemented from our road m
1. **[Statistics Collection](update_statistics.html)**. Collects the
statistics for a table to improve query parallelization. **Available in our
3.2/4.2 release**
2. **[Join Improvements](joins.html)**. Improve existing hash join
implementation.
+ * **[Many-to-many
joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins
where both sides are too large to fit into memory. **Available in our 3.3/4.3
release**
* **[Optimize foreign key
joins](https://issues.apache.org/jira/browse/PHOENIX-852)**. Optimize foreign
key joins by leveraging our skip scan filter. **Available in our 3.2/4.2
release**
* **[Semi/anti
joins](https://issues.apache.org/jira/browse/PHOENIX-167)**. Support semi/anti
subqueries through the standard [NOT] IN and [NOT] EXISTS keywords. **Available
in our 3.2/4.2 release**
3. **[Subqueries](subqueries.html)** Support independent subqueries and
correlated subqueries in the WHERE clause as well as subqueries in the FROM
clause. **Available in our 3.2/4.2 release**
Modified: phoenix/site/source/src/site/markdown/roadmap.md
URL:
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/roadmap.md?rev=1656256&r1=1656255&r2=1656256&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/roadmap.md (original)
+++ phoenix/site/source/src/site/markdown/roadmap.md Sat Jan 31 23:07:21 2015
@@ -4,7 +4,7 @@ Our roadmap is driven by our user commun
1. **[Transaction
Support](https://issues.apache.org/jira/browse/PHOENIX-400)**. Support
transactions by integrating with an open source solution like
[Tephra](https://github.com/continuuity/tephra),
[Themis](https://github.com/XiaoMi/themis), or some other similar option.
1. **[Join
Improvements](https://issues.apache.org/jira/browse/PHOENIX-1167)**. Enhance
our join capabilities in a variety of ways:<br/>
- * **[Many-to-many
joins](https://issues.apache.org/jira/browse/PHOENIX-1179)**. Support joins
where both sides are too large to fit into memory.
+ * **[Table-stats-guided choice between hash join and sort-merge
join](https://issues.apache.org/jira/browse/PHOENIX-1556)**. Base hash join
versus many-to-many decision on how many guideposts will be traversed for RHS
table(s).
* **[Inlined parent/child
joins](https://issues.apache.org/jira/browse/PHOENIX-150)**. Optimize
parent/child joins by storing child rows inside of a parent row, forming the
column qualifier through a known prefix plus the child row primary key.
2. **[Subquery](subqueries.html) Enhancement**, which includes support for
**[correlated subqueries in the HAVING
clause](https://issues.apache.org/jira/browse/PHOENIX-1388)** and **[using
subqueries as
expressions](https://issues.apache.org/jira/browse/PHOENIX-1392)**.
15. **[Cost-based Query
Optimization]((https://issues.apache.org/jira/browse/PHOENIX-1177))**. Enhance
existing [statistics collection](update_statistics.html) by enabling further
query optmizations based on the size and cardinality of the data.