Hi Gabriel,
Thanks a lot for your reply. I was confused. I misunderstood 'not null' constraint as 'required at each upsert'. -William At 2015-12-22 15:25:40, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: >Hi William, > >Yes, all your tests there look correct, and it looks like you've got a >good understanding on the underlying principles of how scans are >working in Phoenix vs HBase. > >About the use of "not null" constraints, you're correct about data >being written by non-Phoenix clients being an issue. However, the >bigger issue is doing upsert statements in general. Phoenix doesn't >know if an UPSERT statement is updating an existing row, or adding a >new row. If an update of an existing row is being done, then it would >be possible to only upsert a subset of the total columns for the row, >but an upsert of a new row would have to include all columns. > >It would be possible to determine if each upsert was a new row by >reading each row before writing it, but this would be very costly in >terms of performance. > >- Gabriel > >On Tue, Dec 22, 2015 at 3:22 AM, William <yhxx...@163.com> wrote: >> Hi Gabriel: >> >> >> Now I think I understand why an empty column is necessary. I did the >> following tests: >> >> >> 1. by HBASE native API: after I deleted the last attribute column of a >> specified row, the whole row was deleted. >> and this is exactly what you said 'the row key is not stored at all >> unless there is at least one column stored.' >> >> >> 2. by Phoenix client: >> * create table tt ( pk varchar not null primary key, cf.b varchar, cf.c >> varchar); >> * upsert into tt values ( 'this is pk'); >> now, do "scan 'tt' " over hbase shell, I got a row with an empty >> column '_0', and this row would not have been existed if phoenix did not >> insert this empty column and of course hbase native API will not allow u to >> do this. >> >> 3. by phoenix client: >> even if we do insert not null non pk column in an upsert, an empty >> column is ALWAYS necessary because columns can be deleted. >> * upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc'); >> * upsert into tt values ( 'this is pk', null, null); >> * or alter table tt drop column cf.b, cf.c >> if we do not insert the empty column at 1st upsert, then after 2nd >> upsert, the whole row will disappear. But for SQL, we expect a row only with >> the PK value. >> >> >> 4. why we cannot add not null constraint on non pk column ? >> I tried this : create table tt (pk varchar not null primary key, cf.b >> varchar not null); >> and it raised an exception : Invalid not null constraint on non primary >> key column columnName=TT.B >> I figured out one reason : the data of a phoenix table might be written >> by hbase native api, which does not support not null constraint on non pk >> column. >> Is it right? Is it the primary reason? >> >> >> thanks >> - William >> >> At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: >>>Hi William, >>> >>>The empty column is needed to ensure that a given column is available >>>for all rows. >>> >>>As you may know, data is stored in HBase as KeyValues, meaning that >>>the full row key is stored for each column value. This also implies >>>that the row key is not stored at all unless there is at least one >>>column stored. >>> >>>Now consider JDBC row which has an integer primary key, and several >>>columns which are all null. In order to be able to store the primary >>>key, a KeyValue needs to be stored to show that the row is present at >>>all. This column is represented by the empty column that you've >>>noticed. This allows doing a "SELECT * FROM TABLE" and receiving >>>records for all rows, even those whose non-pk columns are null. >>> >>>The same issue comes up even if only one column is null for some (or >>>all) records. A scan over Phoenix will include the empty column to >>>ensure that rows that only consist of the primary key (and have null >>>for all non-key columns) will be included in a scan result. >>> >>>- Gabriel >>> >>>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote: >>>> hi all: >>>> I'm reading phoenix src code recently, and i found >>>> PRowImpl.toRowMutations() always adds an empty column named '_0' for >>>> non-delete upsert. Why? >>>> I read the comment but i didn't quite understand it. Might someone >>>> give me an example that illustates why an empty column is ALWAYS necessary? >>>> >>>> >>>> Further more, I have to access a phoenix table by both phoenix client >>>> and hbase API in some cases. If i do not add this empty column explicitly >>>> by hbase API, it is ok if i read this row by phoenix? >>>> >>>> >>>> Thanks >>>> -William