Re: [Drill-Questions] Speed difference between GZ and BZ2

Shankar Mane Sun, 31 Jul 2016 10:48:06 -0700

Awaiting for response..

On 30-Jul-2016 3:20 PM, "Shankar Mane" <[email protected]> wrote:


>

> I am Comparing Querying speed between GZ and BZ2.
>
> Below are the 2 files and their sizes (This 2 files have same data):
> kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
>
>
>
> Results:
>
> 0: jdbc:drill:> select channelid, count(serverTime) from
dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ;
> +------------+----------+
> | channelid  |  EXPR$1  |
> +------------+----------+
> | 3          | 977134   |
> | 0          | 836850   |
> | 2          | 3202854  |
> +------------+----------+
> 3 rows selected (86.034 seconds)
>
>
>
> 0: jdbc:drill:> select channelid, count(serverTime) from
dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by channelid ;
> +------------+----------+
> | channelid  |  EXPR$1  |
> +------------+----------+
> | 3          | 977134   |
> | 0          | 836850   |
> | 2          | 3202854  |
> +------------+----------+
> 3 rows selected (459.079 seconds)
>
>
>
> Questions:
> 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> 2. How can we speed to up Bz2.  Are there any configuration to do ?
> 3. As bz2 is splittable format, How drill using it ?
>
>
> regards,
> shankar

Re: [Drill-Questions] Speed difference between GZ and BZ2

Reply via email to