Yes, thanks, Gabriel - very well thought out answer and another good candidate to add to our FAQs if anyone is up for it.
On Tue, Dec 22, 2015 at 12:39 AM, William <yhxx...@163.com> wrote: > Hi Gabriel, > > > Thanks a lot for your reply. > I was confused. I misunderstood 'not null' constraint as 'required at each > upsert'. > > > -William > > > At 2015-12-22 15:25:40, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: > >Hi William, > > > >Yes, all your tests there look correct, and it looks like you've got a > >good understanding on the underlying principles of how scans are > >working in Phoenix vs HBase. > > > >About the use of "not null" constraints, you're correct about data > >being written by non-Phoenix clients being an issue. However, the > >bigger issue is doing upsert statements in general. Phoenix doesn't > >know if an UPSERT statement is updating an existing row, or adding a > >new row. If an update of an existing row is being done, then it would > >be possible to only upsert a subset of the total columns for the row, > >but an upsert of a new row would have to include all columns. > > > >It would be possible to determine if each upsert was a new row by > >reading each row before writing it, but this would be very costly in > >terms of performance. > > > >- Gabriel > > > >On Tue, Dec 22, 2015 at 3:22 AM, William <yhxx...@163.com> wrote: > >> Hi Gabriel: > >> > >> > >> Now I think I understand why an empty column is necessary. I did the > following tests: > >> > >> > >> 1. by HBASE native API: after I deleted the last attribute column of a > specified row, the whole row was deleted. > >> and this is exactly what you said 'the row key is not stored at all > unless there is at least one column stored.' > >> > >> > >> 2. by Phoenix client: > >> * create table tt ( pk varchar not null primary key, cf.b varchar, > cf.c varchar); > >> * upsert into tt values ( 'this is pk'); > >> now, do "scan 'tt' " over hbase shell, I got a row with an empty > column '_0', and this row would not have been existed if phoenix did not > insert this empty column and of course hbase native API will not allow u to > do this. > >> > >> 3. by phoenix client: > >> even if we do insert not null non pk column in an upsert, an empty > column is ALWAYS necessary because columns can be deleted. > >> * upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc'); > >> * upsert into tt values ( 'this is pk', null, null); > >> * or alter table tt drop column cf.b, cf.c > >> if we do not insert the empty column at 1st upsert, then after > 2nd upsert, the whole row will disappear. But for SQL, we expect a row only > with the PK value. > >> > >> > >> 4. why we cannot add not null constraint on non pk column ? > >> I tried this : create table tt (pk varchar not null primary key, > cf.b varchar not null); > >> and it raised an exception : Invalid not null constraint on non > primary key column columnName=TT.B > >> I figured out one reason : the data of a phoenix table might be > written by hbase native api, which does not support not null constraint on > non pk column. > >> Is it right? Is it the primary reason? > >> > >> > >> thanks > >> - William > >> > >> At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote: > >>>Hi William, > >>> > >>>The empty column is needed to ensure that a given column is available > >>>for all rows. > >>> > >>>As you may know, data is stored in HBase as KeyValues, meaning that > >>>the full row key is stored for each column value. This also implies > >>>that the row key is not stored at all unless there is at least one > >>>column stored. > >>> > >>>Now consider JDBC row which has an integer primary key, and several > >>>columns which are all null. In order to be able to store the primary > >>>key, a KeyValue needs to be stored to show that the row is present at > >>>all. This column is represented by the empty column that you've > >>>noticed. This allows doing a "SELECT * FROM TABLE" and receiving > >>>records for all rows, even those whose non-pk columns are null. > >>> > >>>The same issue comes up even if only one column is null for some (or > >>>all) records. A scan over Phoenix will include the empty column to > >>>ensure that rows that only consist of the primary key (and have null > >>>for all non-key columns) will be included in a scan result. > >>> > >>>- Gabriel > >>> > >>>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote: > >>>> hi all: > >>>> I'm reading phoenix src code recently, and i found > PRowImpl.toRowMutations() always adds an empty column named '_0' for > non-delete upsert. Why? > >>>> I read the comment but i didn't quite understand it. Might > someone give me an example that illustates why an empty column is ALWAYS > necessary? > >>>> > >>>> > >>>> Further more, I have to access a phoenix table by both phoenix > client and hbase API in some cases. If i do not add this empty column > explicitly by hbase API, it is ok if i read this row by phoenix? > >>>> > >>>> > >>>> Thanks > >>>> -William >