Author: jamestaylor
Date: Fri Mar 11 18:17:52 2016
New Revision: 1734609

URL: http://svn.apache.org/viewvc?rev=1734609&view=rev
Log:
Add new Why empty KeyValue FAQ

Modified:
    phoenix/site/publish/faq.html
    phoenix/site/source/src/site/markdown/faq.md

Modified: phoenix/site/publish/faq.html
URL: 
http://svn.apache.org/viewvc/phoenix/site/publish/faq.html?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/publish/faq.html (original)
+++ phoenix/site/publish/faq.html Fri Mar 11 18:17:52 2016
@@ -1,7 +1,7 @@
 
 <!DOCTYPE html>
 <!--
- Generated by Apache Maven Doxia at 2016-03-10
+ Generated by Apache Maven Doxia at 2016-03-11
  Rendered using Reflow Maven Skin 1.1.0 
(http://andriusvelykis.github.io/reflow-maven-skin)
 -->
 <html  xml:lang="en" lang="en">
@@ -158,6 +158,7 @@
  <li><a 
href="#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API">Can
 phoenix work on tables with arbitrary timestamp as flexible as HBase 
API?</a></li> 
  <li><a href="#Why_isnt_my_query_doing_a_RANGE_SCAN">Why isn’t my query 
doing a RANGE SCAN?</a></li> 
  <li><a href="#Should_I_pool_Phoenix_JDBC_Connections">Should I pool Phoenix 
JDBC Connections?</a></li> 
+ <li><a href="#Why_empty_key_value">Why does Phoenix add an empty or dummy 
KeyValue when doing an upsert?</a></li> 
 </ul> 
 <div class="section"> 
  <div class="section"> 
@@ -363,6 +364,13 @@ conn.commit();
   <p>Phoenix’s Connection objects are different from most other JDBC 
Connections due to the underlying HBase connection. The Phoenix Connection 
object is designed to be a thin object that is inexpensive to create. If 
Phoenix Connections are reused, it is possible that the underlying HBase 
connection is not always left in a healthy state by the previous user. It is 
better to create new Phoenix Connections to ensure that you avoid any potential 
issues.</p> 
   <p>Implementing pooling for Phoenix could be done simply by creating a 
delegate Connection that instantiates a new Phoenix connection when retrieved 
from the pool and then closes the connection when returning it to the pool (see 
<a class="externalLink" 
href="https://issues.apache.org/jira/browse/PHOENIX-2388";>PHOENIX-2388</a>).</p>
 
  </div> 
+ <div class="section"> 
+  <h3 id="Why_empty_key_value">Why does Phoenix add an empty/dummy KeyValue 
when doing an upsert?<a 
name="Why_does_Phoenix_add_an_emptydummy_KeyValue_when_doing_an_upsert"></a></h3>
 
+  <p>The empty or dummy KeyValue (with a column qualifier of _0) is needed to 
ensure that a given column is available for all rows.</p> 
+  <p>As you may know, data is stored in HBase as KeyValues, meaning that the 
full row key is stored for each column value. This also implies that the row 
key is not stored at all unless there is at least one column stored.</p> 
+  <p>Now consider JDBC row which has an integer primary key, and several 
columns which are all null. In order to be able to store the primary key, a 
KeyValue needs to be stored to show that the row is present at all. This column 
is represented by the empty column that you’ve noticed. This allows doing a 
“SELECT * FROM TABLE” and receiving records for all rows, even those whose 
non-pk columns are null.</p> 
+  <p>The same issue comes up even if only one column is null for some (or all) 
records. A scan over Phoenix will include the empty column to ensure that rows 
that only consist of the primary key (and have null for all non-key columns) 
will be included in a scan result.</p> 
+ </div> 
 </div>
                        </div>
                </div>

Modified: phoenix/site/source/src/site/markdown/faq.md
URL: 
http://svn.apache.org/viewvc/phoenix/site/source/src/site/markdown/faq.md?rev=1734609&r1=1734608&r2=1734609&view=diff
==============================================================================
--- phoenix/site/source/src/site/markdown/faq.md (original)
+++ phoenix/site/source/src/site/markdown/faq.md Fri Mar 11 18:17:52 2016
@@ -12,7 +12,7 @@
 * [Can phoenix work on tables with arbitrary timestamp as flexible as HBase 
API?](#Can_phoenix_work_on_tables_with_arbitrary_timestamp_as_flexible_as_HBase_API)
 * [Why isn't my query doing a RANGE 
SCAN?](#Why_isnt_my_query_doing_a_RANGE_SCAN)
 * [Should I pool Phoenix JDBC 
Connections?](#Should_I_pool_Phoenix_JDBC_Connections)
-
+* [Why does Phoenix add an empty or dummy KeyValue when doing an 
upsert?](#Why_empty_key_value)
 
 ### I want to get started. Is there a Phoenix _Hello World_?
 
@@ -285,3 +285,26 @@ No, it is not necessary to pool Phoenix
 Phoenix's Connection objects are different from most other JDBC Connections 
due to the underlying HBase connection. The Phoenix Connection object is 
designed to be a thin object that is inexpensive to create. If Phoenix 
Connections are reused, it is possible that the underlying HBase connection is 
not always left in a healthy state by the previous user. It is better to create 
new Phoenix Connections to ensure that you avoid any potential issues.
 
 Implementing pooling for Phoenix could be done simply by creating a delegate 
Connection that instantiates a new Phoenix connection when retrieved from the 
pool and then closes the connection when returning it to the pool (see 
[PHOENIX-2388](https://issues.apache.org/jira/browse/PHOENIX-2388)).
+
+
+### <a id="Why_empty_key_value"/>Why does Phoenix add an empty/dummy KeyValue 
when doing an upsert?
+The empty or dummy KeyValue (with a column qualifier of _0) is needed to 
ensure that a given column is available
+for all rows.
+
+As you may know, data is stored in HBase as KeyValues, meaning that
+the full row key is stored for each column value. This also implies
+that the row key is not stored at all unless there is at least one
+column stored.
+
+Now consider JDBC row which has an integer primary key, and several
+columns which are all null. In order to be able to store the primary
+key, a KeyValue needs to be stored to show that the row is present at
+all. This column is represented by the empty column that you've
+noticed. This allows doing a "SELECT * FROM TABLE" and receiving
+records for all rows, even those whose non-pk columns are null.
+
+The same issue comes up even if only one column is null for some (or
+all) records. A scan over Phoenix will include the empty column to
+ensure that rows that only consist of the primary key (and have null
+for all non-key columns) will be included in a scan result.
+


Reply via email to