[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15800011#comment-15800011
 ] 

Enis Soztutar commented on PHOENIX-2565:
----------------------------------------

Thanks, I was checking the code in the branch before 01ef5d. 

Here: 
https://github.com/apache/phoenix/blob/encodecolumns2/phoenix-core/src/main/java/org/apache/phoenix/schema/types/PArrayDataType.java#L1299
we are still serializing the nulls, no?

bq. For example, if column 1 is set and column 102 is set, we're storing 
offsets for column2 through column 101. We could instead introduce a bit set 
that tracks if a value is set
For doing nulls in Avro, you do a union of the type with the Null type, so all 
nullable fields are encoded like {{<is_null:1byte><type_data:0 or more 
bytes>}}. So avro has to spend 1 byte per nullable field, regardless of whether 
the field is there or not. PB has a different model, where each type is 
prefixed with the id of the field, which also means that if the field is not 
there it is null. So, the cost is 1 varint per field that is not-null (as 
opposed to per field in the schema). Obviously what is optimal depends on 
average whether there is a lot of null-fields in the data or not. 

The cost of doing a bitset for nullability fields would be 1 byte per 8 
"declared" fields (regardless of whether there is null or not). If there is a 
single null field, we are saving 2 or 4 bytes (for the offset). So if on 
average, we expect the data to have at least 1 null per 16 columns or so it 
looks like a good idea to implement this. 

> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, 
> PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to