Re: Re: Re: Why do we need an empty column when doing upsert?

James Taylor Tue, 22 Dec 2015 22:24:07 -0800

Yes, thanks, Gabriel - very well thought out answer and another good
candidate to add to our FAQs if anyone is up for it.


On Tue, Dec 22, 2015 at 12:39 AM, William <yhxx...@163.com> wrote:

> Hi Gabriel,
>
>
> Thanks a lot for your reply.
> I was confused. I misunderstood 'not null' constraint as 'required at each
> upsert'.
>
>
> -William
>
>
> At 2015-12-22 15:25:40, "Gabriel Reid" <gabriel.r...@gmail.com> wrote:
> >Hi William,
> >
> >Yes, all your tests there look correct, and it looks like you've got a
> >good understanding on the underlying principles of how scans are
> >working in Phoenix vs HBase.
> >
> >About the use of "not null" constraints, you're correct about data
> >being written by non-Phoenix clients being an issue. However, the
> >bigger issue is doing upsert statements in general. Phoenix doesn't
> >know if an UPSERT statement is updating an existing row, or adding a
> >new row. If an update of an existing row is being done, then it would
> >be possible to only upsert a subset of the total columns for the row,
> >but an upsert of a new row would have to include all columns.
> >
> >It would be possible to determine if each upsert was a new row by
> >reading each row before writing it, but this would be very costly in
> >terms of performance.
> >
> >- Gabriel
> >
> >On Tue, Dec 22, 2015 at 3:22 AM, William <yhxx...@163.com> wrote:
> >> Hi Gabriel:
> >>
> >>
> >> Now I think I understand why an empty column is necessary. I did the
> following tests:
> >>
> >>
> >> 1. by HBASE native API: after I deleted the last attribute column of a
> specified row, the whole row was deleted.
> >>     and this is exactly what you said 'the row key is not stored at all
> unless there is at least one column stored.'
> >>
> >>
> >> 2. by Phoenix client:
> >>     * create table tt ( pk varchar not null primary key, cf.b varchar,
> cf.c varchar);
> >>     * upsert into tt values ( 'this is pk');
> >>       now, do "scan 'tt' " over hbase shell, I got a row with an empty
> column '_0', and this row would not have been existed if phoenix did not
> insert this empty column and of course hbase native API will not allow u to
> do this.
> >>
> >> 3. by phoenix client:
> >>     even if we do insert not null non pk column in an upsert, an empty
> column is ALWAYS necessary because columns can be deleted.
> >>    *  upsert into tt values ( 'this is pk', 'bbbbb', 'ccccc');
> >>    *  upsert into tt values ( 'this is pk', null, null);
> >>    * or alter table tt drop column cf.b, cf.c
> >>       if we do not insert the empty column at 1st upsert, then after
> 2nd upsert, the whole row will disappear. But for SQL, we expect a row only
> with the PK value.
> >>
> >>
> >> 4. why we cannot add not null constraint on non pk column ?
> >>     I tried this : create table tt (pk varchar not null primary key,
> cf.b varchar not null);
> >>     and it raised an exception : Invalid not null constraint on non
> primary key column columnName=TT.B
> >>     I figured out one reason : the data of a phoenix table might be
> written by hbase native api, which does not support not null constraint on
> non pk column.
> >>    Is it right? Is it the primary reason?
> >>
> >>
> >> thanks
> >> - William
> >>
> >> At 2015-12-21 22:46:30, "Gabriel Reid" <gabriel.r...@gmail.com> wrote:
> >>>Hi William,
> >>>
> >>>The empty column is needed to ensure that a given column is available
> >>>for all rows.
> >>>
> >>>As you may know, data is stored in HBase as KeyValues, meaning that
> >>>the full row key is stored for each column value. This also implies
> >>>that the row key is not stored at all unless there is at least one
> >>>column stored.
> >>>
> >>>Now consider JDBC row which has an integer primary key, and several
> >>>columns which are all null. In order to be able to store the primary
> >>>key, a KeyValue needs to be stored to show that the row is present at
> >>>all. This column is represented by the empty column that you've
> >>>noticed. This allows doing a "SELECT * FROM TABLE" and receiving
> >>>records for all rows, even those whose non-pk columns are null.
> >>>
> >>>The same issue comes up even if only one column is null for some (or
> >>>all) records. A scan over Phoenix will include the empty column to
> >>>ensure that rows that only consist of the primary key (and have null
> >>>for all non-key columns) will be included in a scan result.
> >>>
> >>>- Gabriel
> >>>
> >>>On Mon, Dec 21, 2015 at 2:58 PM, 杨晗 <yhxx...@163.com> wrote:
> >>>> hi all:
> >>>>      I'm reading phoenix src code recently, and i found
> PRowImpl.toRowMutations() always adds an empty column named '_0' for
> non-delete upsert. Why?
> >>>>      I read the comment but i didn't quite understand it. Might
> someone give me an example that illustates why an empty column is ALWAYS
> necessary?
> >>>>
> >>>>
> >>>>     Further more, I have to access a phoenix table by both phoenix
> client and hbase API in some cases. If i do not add this empty column
> explicitly by hbase API, it is ok  if i read this row by phoenix?
> >>>>
> >>>>
> >>>> Thanks
> >>>> -William
>

Re: Re: Re: Why do we need an empty column when doing upsert?

Reply via email to