[
https://issues.apache.org/jira/browse/HBASE-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13512052#comment-13512052
]
stack commented on HBASE-7233:
------------------------------
[[email protected]] Lets make it so KV is evolvable else lets go home!
Has to be backward compatible though -- yeah. Can you not leverage the hfile
version and if older, transform old to new style blocks? (Sorry if that a dumb
idea. Did you look at overriding the key type to add in 'version' on the top
few bits? Hmm... that is probably no good because you need to be able to find
the type in the middle of the byte array ... )
bq. ...and store the tags pretended to user data as part of the value section
of the KV.
Ugh. Yeah, needs to be inline.
So, we can say that KV is going to evolve so we need to just deal.
[~mcorgan] We can't do pb kvs to put them into an hfile. Sorry if you got that
impression. Would be just way too slow.
I think a new KV/Cell format would require a new encoder, one that could send
all in the new format. Clients would ask for the new encoder format only if
they knew how to decode.
Chatting w/ Todd, he had some good suggestions. I tried on him my concern that
we would be putting ourselves in a ghetto if we are not spitting a well-known
serialization like avro or thrift out the front door. He made Andrew's above
argument that can't do prefixtree like compressions w/ thrift/avro and that a
client that goes natively against hbase is already an undertaking keeping cache
of regions etc., so not too much to ask it be able to do at least a basic data
block encoding/decoding.
Rather than KVs, because they are too atomic an entity, we should probably send
datablocks after we send a pb header (as per Matt). The most basic would
serialize kvs as we do now (as per Matt).
Other interesting suggestions were sending the data first, before we send the
pb header describing its content w/ say a DATA<length> prefix so client
accumulates the data and then reads the pb header to figure which encoder to
use on it. So, at its base, our RPC becomes sending of DATA<length> and
PBUC<serialized delimited pb>.
> Serializing KeyValues
> ---------------------
>
> Key: HBASE-7233
> URL: https://issues.apache.org/jira/browse/HBASE-7233
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Attachments: 7233.txt, 7233-v2.txt
>
>
> Undo KeyValue being a Writable.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira