[jira] [Comment Edited] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement

Kevin Liew (JIRA) Mon, 26 Sep 2016 19:40:34 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524834#comment-15524834
 ]


Kevin Liew edited comment on PHOENIX-476 at 9/27/16 2:40 AM:
-------------------------------------------------------------

I did take a look at `PRowImpl.setValue` but that only determines how rows are 
serialized to disk. `KeyValueSchema.toBytes` determines how rows are serialized 
from the coprocessor to be sent to the driver. 

{code:java}
for (int j = 0; j < field.getCount(); j++) {
                if (expressions[index].evaluate(tuple, ptr) && ptr.getLength() 
> 0) { // Skip null values
                    if (index >= minNullableIndex) {
                        valueSet.set(index - minNullableIndex);
                    }
                    if (!type.isFixedWidth()) {
                        b = ensureSize(b, offset, offset + 
getVarLengthBytes(ptr.getLength()));
                        offset = writeVarLengthField(ptr, b, offset);
                    } else {
                        int nBytes = ptr.getLength();
                        b = ensureSize(b, offset, offset + nBytes);
                        System.arraycopy(ptr.get(), ptr.getOffset(), b, offset, 
nBytes);
                        offset += nBytes;
                    }
                }
                index++;
            }
{code}

In the code above (for `KeyValueSchema.toBytes`), regardless of whether we 
store the null value on disk, nulls are not set in the value bitmask. _The 
driver receives only non-null values and a value bitmask from the coprocessor_, 
using these to evaluate the Expression tree.

We could decide to store nulls but not default values on disk, which is what I 
think you were suggesting. Then we could differentiate nulls from default 
values and set default values in the value bitmask so that the driver will not 
see them as null values. 
But the driver also uses the value bitmask to determine the offset used to 
deserialize each cell, and now the driver will not be able to differentiate 
between non-null values that were serialized (and produce additional offset) 
and default values that were not serialized (and throws an exception because 
the expected byte array size is smaller than expected).

The previous test cases were only passing before due to a quirk in the 
`KeyValueSchema.next` -> `Expression.evaluate` logic where non-trailing nulls 
evaluate false -> true while trailing nulls evaluate null -> false.


was (Author: kliew):
I did take a look at `PRowImpl.setValue` but that only determines how rows are 
serialized to disk. `KeyValueSchema.toBytes` determines how rows are serialized 
from the coprocessor to be sent to the driver. 

{code:java}
if (expressions[index].evaluate(tuple, ptr) && ptr.getLength() > 0) { // Skip 
null values
    if (index >= minNullableIndex) {
        valueSet.set(index - minNullableIndex);
    }
    if (!type.isFixedWidth()) {
        b = ensureSize(b, offset, offset + getVarLengthBytes(ptr.getLength()));
        offset = writeVarLengthField(ptr, b, offset);
    } else {
        int nBytes = ptr.getLength();
        b = ensureSize(b, offset, offset + nBytes);
        System.arraycopy(ptr.get(), ptr.getOffset(), b, offset, nBytes);
        offset += nBytes;
    }
}
{code}

In the code above (for `KeyValueSchema.toBytes`), regardless of whether we 
store the null value on disk, nulls are not set in the value bitmask. _The 
driver receives only non-null values and a value bitmask from the coprocessor_, 
using these to evaluate the Expression tree.

We could decide to store nulls but not default values on disk, which is what I 
think you were suggesting. Then we could differentiate nulls from default 
values and set default values in the value bitmask so that the driver will not 
see them as null values. 
But the driver also uses the value bitmask to determine the offset used to 
deserialize each cell, and now the driver will not be able to differentiate 
between non-null values that were serialized (and produce additional offset) 
and default values that were not serialized (and throws an exception because 
the expected byte array size is smaller than expected).

The previous test cases were only passing before due to a quirk in the 
`KeyValueSchema.next` -> `Expression.evaluate` logic where non-trailing nulls 
evaluate false -> true while trailing nulls evaluate null -> false.

> Support declaration of DEFAULT in CREATE statement
> --------------------------------------------------
>
>                 Key: PHOENIX-476
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-476
>             Project: Phoenix
>          Issue Type: Task
>    Affects Versions: 3.0-Release
>            Reporter: James Taylor
>            Assignee: Kevin Liew
>              Labels: enhancement
>         Attachments: PHOENIX-476.2.patch, PHOENIX-476.3.patch, 
> PHOENIX-476.patch
>
>
> Support the declaration of a default value in the CREATE TABLE/VIEW statement 
> like this:
>     CREATE TABLE Persons (
>         Pid int NOT NULL PRIMARY KEY,
>         LastName varchar(255) NOT NULL,
>         FirstName varchar(255),
>         Address varchar(255),
>         City varchar(255) DEFAULT 'Sandnes'
>     )
> To implement this, we'd need to:
> 1. add a new DEFAULT_VALUE key value column in SYSTEM.TABLE and pass through 
> the value when the table is created (in MetaDataClient).
> 2. always set NULLABLE to ResultSetMetaData.columnNoNulls if a default value 
> is present, since the column will never be null.
> 3. add a getDefaultValue() accessor in PColumn
> 4.  for a row key column, during UPSERT use the default value if no value was 
> specified for that column. This could be done in the PTableImpl.newKey method.
> 5.  for a key value column with a default value, we can get away without 
> incurring any storage cost. Although a little bit of extra effort than if we 
> persisted the default value on an UPSERT for key value columns, this approach 
> has the benefit of not incurring any storage cost for a default value.
>     * serialize any default value into KeyValueColumnExpression
>     * in the evaluate method of KeyValueColumnExpression, conditionally use 
> the default value if the column value is not present. If doing partial 
> evaluation, you should not yet return the default value, as we may not have 
> encountered the the KeyValue for the column yet (since a filter evaluates 
> each time it sees each KeyValue, and there may be more than one KeyValue 
> referenced in the expression). Partial evaluation is determined by calling 
> Tuple.isImmutable(), where false means it is NOT doing partial evaluation, 
> while true means it is.
>     * modify EvaluateOnCompletionVisitor by adding a visitor method for 
> RowKeyColumnExpression and KeyValueColumnExpression to set 
> evaluateOnCompletion to true if they have a default value specified. This 
> will cause filter evaluation to execute one final time after all KeyValues 
> for a row have been seen, since it's at this time we know we should use the 
> default value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-476) Support declaration of DEFAULT in CREATE statement

Reply via email to