[
https://issues.apache.org/jira/browse/PHOENIX-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14732457#comment-14732457
]
Lars Hofhansl commented on PHOENIX-1598:
----------------------------------------
James and I were discussing this on Friday. It would have a lot of advantages
and no disadvantages that I am aware of.
I'd have the column names be 1, 2, 3, ..., rather than a, b, c, ..., but that's
cosmetic. Or we can have HEX numbers.
The key is (for PHOENIX-1940) that we can determine the ordinal of a column
from its name, rather than having to do a binary search for it.
We could transition old table by have this mapping be the identity mapping.
I.e. if we already have a column of name "some_column" we'd map "some_column"
to "some_column" in the mapping. We'd lose the optimizations but still can
rename columns cheaply.
Not yet sure if we'd need change anything in HBase. HBase is fundamentally
sparse, so we can't know ahead of time how many columns will be returned per
row, not even how many column we'd expect. Should discuss. A possible solution
is do have "dense" columns packed into a single key value. Storage would be
_much_ improved so we read performance for cases where we'd want to see most of
those columns. Write would suffer for a simple solution (would need to read
back the old values, and rewrite with the new value replaced), could store
"update" Cells instead that only hold the diff, and that would be combined
during the next compaction. It would be important to store data such that does
not have be serialized and deserialized from the row (so PB, Avro, probably
out, need to check).
But that's something HBase and/or Phoenix desperately need. I think this should
sit on top of HBase as HBase cannot know about optimize storage/packing formats
for various problems. Maybe a library.
> encode column names to save space
> ----------------------------------
>
> Key: PHOENIX-1598
> URL: https://issues.apache.org/jira/browse/PHOENIX-1598
> Project: Phoenix
> Issue Type: Improvement
> Reporter: noam bulvik
>
> when creating table using phoenix DDL replace the column names that the user
> give with shorter names to save space. the user will still the full name is
> his select statements and will get them in the result set but under the hood
> the infra will translate the names to their sorter version.
> example:
> when creating table with my_column_1, my_column_2 ... the table will be
> created with a as first column , b as the second one etc'
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)