Re: Re: Why do we need an empty column when doing upsert?

Gabriel Reid Mon, 21 Dec 2015 23:26:18 -0800

Hi William,

Yes, all your tests there look correct, and it looks like you've got a
good understanding on the underlying principles of how scans are
working in Phoenix vs HBase.


About the use of "not null" constraints, you're correct about data
being written by non-Phoenix clients being an issue. However, the
bigger issue is doing upsert statements in general. Phoenix doesn't
know if an UPSERT statement is updating an existing row, or adding a
new row. If an update of an existing row is being done, then it would
be possible to only upsert a subset of the total columns for the row,
but an upsert of a new row would have to include all columns.

It would be possible to determine if each upsert was a new row by
reading each row before writing it, but this would be very costly in
terms of performance.

- Gabriel

On Tue, Dec 22, 2015 at 3:22 AM, William <yhxx...@163.com> wrote:
> Hi Gabriel:
>
>
> Now I think I understand why an empty column is necessary. I did the 
> following tests:
>
>
> 1. by HBASE native API: after I deleted the last attribute column of a 
> specified row, the whole row was deleted.
>     and this is exactly what you said 'the row key is not stored at all 
> unless there is at least one column stored.'
>
>
> 2. by Phoenix client:
>     * create table tt ( pk varchar not null primary key, cf.b varchar, cf.c 
> varchar);
>     * upsert into tt values ( 'this is pk');
>       now, do "scan 'tt' " over hbase shell, I got a row with an empty column 
> '_0', and this row would not have been existed if phoenix did not insert this 
> empty column and of course hbase native API will not allow u to do this.
>
> 3. by phoenix client:
>     even if we do insert not null non pk column in an upsert, an empty column 
> is ALWAYS necessary because columns can be deleted.
>    *  upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc');
>    *  upsert into tt values ( 'this is pk', null, null);
>    * or alter table tt drop column cf.b, cf.c
>       if we do not insert the empty column at 1st upsert, then after 2nd 
> upsert, the whole row will disappear. But for SQL, we expect a row only with 
> the PK value.
>
>
> 4. why we cannot add not null constraint on non pk column ?
>     I tried this : create table tt (pk varchar not null primary key, cf.b 
> varchar not null);
>     and it raised an exception : Invalid not null constraint on non primary 
> key column columnName=TT.B
>     I figured out one reason : the data of a phoenix table might be written 
> by hbase native api, which does not support not null constraint on non pk 
> column.
>    Is it right? Is it the primary reason?
>
>
> thanks
> - William
>
> At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote:
>>Hi William,
>>
>>The empty column is needed to ensure that a given column is available
>>for all rows.
>>
>>As you may know, data is stored in HBase as KeyValues, meaning that
>>the full row key is stored for each column value. This also implies
>>that the row key is not stored at all unless there is at least one
>>column stored.
>>
>>Now consider JDBC row which has an integer primary key, and several
>>columns which are all null. In order to be able to store the primary
>>key, a KeyValue needs to be stored to show that the row is present at
>>all. This column is represented by the empty column that you've
>>noticed. This allows doing a "SELECT * FROM TABLE" and receiving
>>records for all rows, even those whose non-pk columns are null.
>>
>>The same issue comes up even if only one column is null for some (or
>>all) records. A scan over Phoenix will include the empty column to
>>ensure that rows that only consist of the primary key (and have null
>>for all non-key columns) will be included in a scan result.
>>
>>- Gabriel
>>
>>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote:
>>> hi all:
>>>      I'm reading phoenix src code recently, and i found 
>>> PRowImpl.toRowMutations() always adds an empty column named '_0' for 
>>> non-delete upsert. Why?
>>>      I read the comment but i didn't quite understand it. Might someone 
>>> give me an example that illustates why an empty column is ALWAYS necessary?
>>>
>>>
>>>     Further more, I have to access a phoenix table by both phoenix client 
>>> and hbase API in some cases. If i do not add this empty column explicitly 
>>> by hbase API, it is ok  if i read this row by phoenix?
>>>
>>>
>>> Thanks
>>> -William

Re: Re: Why do we need an empty column when doing upsert?

Reply via email to