[ 
https://issues.apache.org/jira/browse/PHOENIX-2565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15799554#comment-15799554
 ] 

James Taylor commented on PHOENIX-2565:
---------------------------------------

[~enis] - the format you outlined *is* the format of the single key value 
format. It's very similar to the array format with the differences being:
- no separators are stored
- we store the number of elements and calculate the offset_start instead of 
storing it.

[~samarthjain] I think we should proceed with what we have but make sure that 
new/alternate storage schemes can be introduced. The bulk of the change is to 
create a level of indirection for column names and to have a way of doing a 
positional lookup. Our use cases don't require aggregation to be fast and our 
data is not sparse. There will be other use cases that can be better optimized 
with a different format, but given that new storage schemes can be added, these 
can be added in the future.

For the particular format you mentioned, [~samarthjain]:
- You'd need to do a binary search given the position since you wouldn't be 
able to find the byte offset directly through an array dereference. This type 
of scheme would probably be similar in performance to our column encoding 
scheme for mutable data.
- Only a count ( * ) query would get faster for aggregation - other types of 
aggregation would still be slower (as they'd require reading the larger, single 
KeyValue that contains all the values).
- It's not necessary to store the length because you can look at the offset of 
the next element to calculate the length

I don't think we can do a whole lot to speed up aggregation. I think we're 
being hit with the cost of reading the large single KeyValue. There might be 
some simple things we could do to improve sparse storage. The extra storage 
cost is due to storing the byte offset for all elements between values that are 
set. For example, if column 1 is set and column 102 is set, we're storing 
offsets for column2 through column 101. We could instead introduce a bit set 
that tracks if a value is set. Instead of storing 100 shorts (200 bytes) we'd 
store 13 bytes for the bit set. I'm not sure this is going to make a big 
difference, though.



> Store data for immutable tables in single KeyValue
> --------------------------------------------------
>
>                 Key: PHOENIX-2565
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-2565
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Assignee: Thomas D'Silva
>         Attachments: PHOENIX-2565-v2.patch, PHOENIX-2565-wip.patch, 
> PHOENIX-2565.patch
>
>
> Since an immutable table (i.e. declared with IMMUTABLE_ROWS=true) will never 
> update a column value, it'd be more efficient to store all column values for 
> a row in a single KeyValue. We could use the existing format we have for 
> variable length arrays.
> For backward compatibility, we'd need to support the current mechanism. Also, 
> you'd no longer be allowed to transition an existing table to/from being 
> immutable. I think the best approach would be to introduce a new IMMUTABLE 
> keyword and use it like this:
> {code}
> CREATE IMMUTABLE TABLE ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to