[jira] [Commented] (PHOENIX-1598) encode column names to save space

Lars Hofhansl (JIRA) Sun, 06 Sep 2015 10:09:03 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732457#comment-14732457
 ]


Lars Hofhansl commented on PHOENIX-1598:
----------------------------------------

James and I were discussing this on Friday. It would have a lot of advantages 
and no disadvantages that I am aware of.

I'd have the column names be 1, 2, 3, ..., rather than a, b, c, ..., but that's 
cosmetic. Or we can have HEX numbers.
The key is (for PHOENIX-1940) that we can determine the ordinal of a column 
from its name, rather than having to do a binary search for it.

We could transition old table by have this mapping be the identity mapping. 
I.e. if we already have a column of name "some_column" we'd map "some_column" 
to "some_column" in the mapping. We'd lose the optimizations but still can 
rename columns cheaply.

Not yet sure if we'd need change anything in HBase. HBase is fundamentally 
sparse, so we can't know ahead of time how many columns will be returned per 
row, not even how many column we'd expect. Should discuss. A possible solution 
is do have "dense" columns packed into a single key value. Storage would be 
_much_ improved so we read performance for cases where we'd want to see most of 
those columns. Write would suffer for a simple solution (would need to read 
back the old values, and rewrite with the new  value replaced), could store 
"update" Cells instead that only hold the diff, and that would be combined 
during the next compaction. It would be important to store data such that does 
not have be serialized and deserialized from the row (so PB, Avro, probably 
out, need to check).
But that's something HBase and/or Phoenix desperately need. I think this should 
sit on top of HBase as HBase cannot know about optimize storage/packing formats 
for various problems. Maybe a library.


> encode column names to save space 
> ----------------------------------
>
>                 Key: PHOENIX-1598
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1598
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: noam bulvik
>
> when creating table using phoenix DDL replace the column names that the user 
> give with shorter names to save space. the user will still the full name is 
> his select statements and will get them in the result set but under the hood 
> the infra will translate the names to their sorter version.
> example:
> when creating table with my_column_1, my_column_2 ... the table will be 
> created with a as first column , b as the second one etc'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (PHOENIX-1598) encode column names to save space

Reply via email to