Hi William, Yes, all your tests there look correct, and it looks like you've got a good understanding on the underlying principles of how scans are working in Phoenix vs HBase.
About the use of "not null" constraints, you're correct about data being written by non-Phoenix clients being an issue. However, the bigger issue is doing upsert statements in general. Phoenix doesn't know if an UPSERT statement is updating an existing row, or adding a new row. If an update of an existing row is being done, then it would be possible to only upsert a subset of the total columns for the row, but an upsert of a new row would have to include all columns. It would be possible to determine if each upsert was a new row by reading each row before writing it, but this would be very costly in terms of performance. - Gabriel On Tue, Dec 22, 2015 at 3:22 AM, William <yhxx...@163.com> wrote: > Hi Gabriel: > > > Now I think I understand why an empty column is necessary. I did the > following tests: > > > 1. by HBASE native API: after I deleted the last attribute column of a > specified row, the whole row was deleted. > and this is exactly what you said 'the row key is not stored at all > unless there is at least one column stored.' > > > 2. by Phoenix client: > * create table tt ( pk varchar not null primary key, cf.b varchar, cf.c > varchar); > * upsert into tt values ( 'this is pk'); > now, do "scan 'tt' " over hbase shell, I got a row with an empty column > '_0', and this row would not have been existed if phoenix did not insert this > empty column and of course hbase native API will not allow u to do this. > > 3. by phoenix client: > even if we do insert not null non pk column in an upsert, an empty column > is ALWAYS necessary because columns can be deleted. > * upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc'); > * upsert into tt values ( 'this is pk', null, null); > * or alter table tt drop column cf.b, cf.c > if we do not insert the empty column at 1st upsert, then after 2nd > upsert, the whole row will disappear. But for SQL, we expect a row only with > the PK value. > > > 4. why we cannot add not null constraint on non pk column ? > I tried this : create table tt (pk varchar not null primary key, cf.b > varchar not null); > and it raised an exception : Invalid not null constraint on non primary > key column columnName=TT.B > I figured out one reason : the data of a phoenix table might be written > by hbase native api, which does not support not null constraint on non pk > column. > Is it right? Is it the primary reason? > > > thanks > - William > > At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: >>Hi William, >> >>The empty column is needed to ensure that a given column is available >>for all rows. >> >>As you may know, data is stored in HBase as KeyValues, meaning that >>the full row key is stored for each column value. This also implies >>that the row key is not stored at all unless there is at least one >>column stored. >> >>Now consider JDBC row which has an integer primary key, and several >>columns which are all null. In order to be able to store the primary >>key, a KeyValue needs to be stored to show that the row is present at >>all. This column is represented by the empty column that you've >>noticed. This allows doing a "SELECT * FROM TABLE" and receiving >>records for all rows, even those whose non-pk columns are null. >> >>The same issue comes up even if only one column is null for some (or >>all) records. A scan over Phoenix will include the empty column to >>ensure that rows that only consist of the primary key (and have null >>for all non-key columns) will be included in a scan result. >> >>- Gabriel >> >>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote: >>> hi all: >>> I'm reading phoenix src code recently, and i found >>> PRowImpl.toRowMutations() always adds an empty column named '_0' for >>> non-delete upsert. Why? >>> I read the comment but i didn't quite understand it. Might someone >>> give me an example that illustates why an empty column is ALWAYS necessary? >>> >>> >>> Further more, I have to access a phoenix table by both phoenix client >>> and hbase API in some cases. If i do not add this empty column explicitly >>> by hbase API, it is ok if i read this row by phoenix? >>> >>> >>> Thanks >>> -William