Hi Gabriel:
Now I think I understand why an empty column is necessary. I did the following tests: 1. by HBASE native API: after I deleted the last attribute column of a specified row, the whole row was deleted. and this is exactly what you said 'the row key is not stored at all unless there is at least one column stored.' 2. by Phoenix client: * create table tt ( pk varchar not null primary key, cf.b varchar, cf.c varchar); * upsert into tt values ( 'this is pk'); now, do "scan 'tt' " over hbase shell, I got a row with an empty column '_0', and this row would not have been existed if phoenix did not insert this empty column and of course hbase native API will not allow u to do this. 3. by phoenix client: even if we do insert not null non pk column in an upsert, an empty column is ALWAYS necessary because columns can be deleted. * upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc'); * upsert into tt values ( 'this is pk', null, null); * or alter table tt drop column cf.b, cf.c if we do not insert the empty column at 1st upsert, then after 2nd upsert, the whole row will disappear. But for SQL, we expect a row only with the PK value. 4. why we cannot add not null constraint on non pk column ? I tried this : create table tt (pk varchar not null primary key, cf.b varchar not null); and it raised an exception : Invalid not null constraint on non primary key column columnName=TT.B I figured out one reason : the data of a phoenix table might be written by hbase native api, which does not support not null constraint on non pk column. Is it right? Is it the primary reason? thanks - William At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: >Hi William, > >The empty column is needed to ensure that a given column is available >for all rows. > >As you may know, data is stored in HBase as KeyValues, meaning that >the full row key is stored for each column value. This also implies >that the row key is not stored at all unless there is at least one >column stored. > >Now consider JDBC row which has an integer primary key, and several >columns which are all null. In order to be able to store the primary >key, a KeyValue needs to be stored to show that the row is present at >all. This column is represented by the empty column that you've >noticed. This allows doing a "SELECT * FROM TABLE" and receiving >records for all rows, even those whose non-pk columns are null. > >The same issue comes up even if only one column is null for some (or >all) records. A scan over Phoenix will include the empty column to >ensure that rows that only consist of the primary key (and have null >for all non-key columns) will be included in a scan result. > >- Gabriel > >On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote: >> hi all: >> I'm reading phoenix src code recently, and i found >> PRowImpl.toRowMutations() always adds an empty column named '_0' for >> non-delete upsert. Why? >> I read the comment but i didn't quite understand it. Might someone give >> me an example that illustates why an empty column is ALWAYS necessary? >> >> >> Further more, I have to access a phoenix table by both phoenix client >> and hbase API in some cases. If i do not add this empty column explicitly by >> hbase API, it is ok if i read this row by phoenix? >> >> >> Thanks >> -William