Author: jamestaylor
Date: Fri Mar 11 18:17:52 2016
New Revision: 1734609
URL: http://svn.apache.org/viewvc?rev=1734609&view=rev
Log:
Add new Why empty KeyValue FAQ
Modified:
phoenix/site/publish/faq.html
phoenix/site/source/src/site/markdown/faq.md
Modified: phoenix/site/publish/faq.html
URL:
http://svn.apache.org/viewvc/phoenix/site/publish/faq.html?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/publish/faq.html (original)
+++ phoenix/site/publish/faq.html Fri Mar 11 18:17:52 2016
@@ -1,7 +1,7 @@
<!DOCTYPE html>
<!--
- Generated by Apache Maven Doxia at 2016-03-10
+ Generated by Apache Maven Doxia at 2016-03-11
Rendered using Reflow Maven Skin 1.1.0
(http://andriusvelykis.github.io/reflow-maven-skin)
-->
<html xml:lang="en" lang="en">
@@ -158,6 +158,7 @@
<li><a
href="#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API">Can
phoenix work on tables with arbitrary timestamp as flexible as HBase
API?</a></li>
<li><a href="#Why_isnt_my_query_doing_a_RANGE_SCAN">Why isnât my query
doing a RANGE SCAN?</a></li>
<li><a href="#Should_I_pool_Phoenix_JDBC_Connections">Should I pool Phoenix
JDBC Connections?</a></li>
+ <li><a href="#Why_empty_key_value">Why does Phoenix add an empty or dummy
KeyValue when doing an upsert?</a></li>
</ul>
<div class="section">
<div class="section">
@@ -363,6 +364,13 @@ conn.commit();
<p>Phoenixâs Connection objects are different from most other JDBC
Connections due to the underlying HBase connection. The Phoenix Connection
object is designed to be a thin object that is inexpensive to create. If
Phoenix Connections are reused, it is possible that the underlying HBase
connection is not always left in a healthy state by the previous user. It is
better to create new Phoenix Connections to ensure that you avoid any potential
issues.</p>
<p>Implementing pooling for Phoenix could be done simply by creating a
delegate Connection that instantiates a new Phoenix connection when retrieved
from the pool and then closes the connection when returning it to the pool (see
<a class="externalLink"
href="https://issues.apache.org/jira/browse/PHOENIX-2388">PHOENIX-2388</a>).</p>
</div>
+ <div class="section">
+ <h3 id="Why_empty_key_value">Why does Phoenix add an empty/dummy KeyValue
when doing an upsert?<a
name="Why_does_Phoenix_add_an_emptydummy_KeyValue_when_doing_an_upsert"></a></h3>
+ <p>The empty or dummy KeyValue (with a column qualifier of _0) is needed to
ensure that a given column is available for all rows.</p>
+ <p>As you may know, data is stored in HBase as KeyValues, meaning that the
full row key is stored for each column value. This also implies that the row
key is not stored at all unless there is at least one column stored.</p>
+ <p>Now consider JDBC row which has an integer primary key, and several
columns which are all null. In order to be able to store the primary key, a
KeyValue needs to be stored to show that the row is present at all. This column
is represented by the empty column that youâve noticed. This allows doing a
âSELECT * FROM TABLEâ and receiving records for all rows, even those whose
non-pk columns are null.</p>
+ <p>The same issue comes up even if only one column is null for some (or all)
records. A scan over Phoenix will include the empty column to ensure that rows
that only consist of the primary key (and have null for all non-key columns)
will be included in a scan result.</p>
+ </div>
</div>
</div>
</div>
Modified: phoenix/site/source/src/site/markdown/faq.md
URL:
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/faq.md?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/faq.md (original)
+++ phoenix/site/source/src/site/markdown/faq.md Fri Mar 11 18:17:52 2016
@@ -12,7 +12,7 @@
* [Can phoenix work on tables with arbitrary timestamp as flexible as HBase
API?](#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API)
* [Why isn't my query doing a RANGE
SCAN?](#Why_isnt_my_query_doing_a_RANGE_SCAN)
* [Should I pool Phoenix JDBC
Connections?](#Should_I_pool_Phoenix_JDBC_Connections)
-
+* [Why does Phoenix add an empty or dummy KeyValue when doing an
upsert?](#Why_empty_key_value)
### I want to get started. Is there a Phoenix _Hello World_?
@@ -285,3 +285,26 @@ No, it is not necessary to pool Phoenix
Phoenix's Connection objects are different from most other JDBC Connections
due to the underlying HBase connection. The Phoenix Connection object is
designed to be a thin object that is inexpensive to create. If Phoenix
Connections are reused, it is possible that the underlying HBase connection is
not always left in a healthy state by the previous user. It is better to create
new Phoenix Connections to ensure that you avoid any potential issues.
Implementing pooling for Phoenix could be done simply by creating a delegate
Connection that instantiates a new Phoenix connection when retrieved from the
pool and then closes the connection when returning it to the pool (see
[PHOENIX-2388](https://issues.apache.org/jira/browse/PHOENIX-2388)).
+
+
+### <a id="Why_empty_key_value"/>Why does Phoenix add an empty/dummy KeyValue
when doing an upsert?
+The empty or dummy KeyValue (with a column qualifier of _0) is needed to
ensure that a given column is available
+for all rows.
+
+As you may know, data is stored in HBase as KeyValues, meaning that
+the full row key is stored for each column value. This also implies
+that the row key is not stored at all unless there is at least one
+column stored.
+
+Now consider JDBC row which has an integer primary key, and several
+columns which are all null. In order to be able to store the primary
+key, a KeyValue needs to be stored to show that the row is present at
+all. This column is represented by the empty column that you've
+noticed. This allows doing a "SELECT * FROM TABLE" and receiving
+records for all rows, even those whose non-pk columns are null.
+
+The same issue comes up even if only one column is null for some (or
+all) records. A scan over Phoenix will include the empty column to
+ensure that rows that only consist of the primary key (and have null
+for all non-key columns) will be included in a scan result.
+