[
https://issues.apache.org/jira/browse/TAJO-1432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377538#comment-14377538
]
Yongjin Choi edited comment on TAJO-1432 at 4/1/15 12:01 AM:
-------------------------------------------------------------
After investigating related codes, I found two ways to implement this.
First, we can add compression codec to channel pipeline in netty. But, I am
afraid that it affect the whole protobuf messages and it is difficult to
selectively compress some messages.
Second, we can change ‘SerializedResultSet’ of ClientProtos.proto to include
“optional compressedTuples”. It looks better choice because it does not affect
other messages, and we can choose whether applying compression or not.
I will add a 'ResultSetCompressionEncoder’ which converts List<Tuple> to
compressed byte[].
getQueryResultData() handler will call that encoder and add the result to
protobuf message if session variable like ‘compression=true’ is set.
May I proceed in this way?
By the way, I got an another idea.
If a client library can prefetch the result asynchronously, every next() call
of client cursor can return immediately after having the first result.
IMHO, this can be another issue for performance improvement.
was (Author: yongjin.choi):
After investigating related codes, I found two ways to implement this.
First, we can add compression codec to channel pipeline in netty. But, I am
afraid that it affect the whole protobuf messages and it is difficult to
selectively compress some messages.
Second, we can change ‘SerializedResultSet’ of ClientProtos.proto to include
“optional compressedTuples”. It looks better choice because it does affect
other messages, and we can choose whether applying compression or not.
I will add a ‘ResultSetCompressionEncoder’ class which converts List<Tuple> to
compressed byte[].
getQueryResultData() handler will call that encoder and add the result to
protobuf message if session variable like ‘compression=true’ is set.
May I proceed in this way?
By the way, I got an another idea.
If a client library can prefetch the result asynchronously, every next() call
of client cursor can return immediately after having the first result.
IMHO, this can be another issue for performance improvement.
> Support compressed result stream
> --------------------------------
>
> Key: TAJO-1432
> URL: https://issues.apache.org/jira/browse/TAJO-1432
> Project: Tajo
> Issue Type: Improvement
> Components: client
> Reporter: Yongjin Choi
> Assignee: Yongjin Choi
>
> Sometimes, Tajo should return big result set (a few or tens of gigabytes!)
> given a query.
> In many cases, the network bandwidth between client (BI tool) and tajo master
> is expensive or not good enough.
> So, it would be great if the communication between client and tajo master
> could be compressed as a session option.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)