Re: [Drill-Questions] Speed difference between GZ and BZ2

Khurram Faraaz Mon, 01 Aug 2016 02:58:14 -0700

What is the data format within those .gz and .bz2 files ? It is parquet or
JSON or plain text (CSV) ?
Also, what was this config parameter `store.parquet.compression` set to,
when ypu ran your test ?


- Khurram

On Sun, Jul 31, 2016 at 11:17 PM, Shankar Mane <[email protected]>
wrote:

> Awaiting for response..
>
> On 30-Jul-2016 3:20 PM, "Shankar Mane" <[email protected]> wrote:
>
> >
>
> > I am Comparing Querying speed between GZ and BZ2.
> >
> > Below are the 2 files and their sizes (This 2 files have same data):
> > kafka_3_25-Jul-2016-12a.json.gz = 1.8G
> > kafka_3_25-Jul-2016-12a.json.bz2= 1.1G
> >
> >
> >
> > Results:
> >
> > 0: jdbc:drill:> select channelid, count(serverTime) from
> dfs.`/tmp/stest-gz/kafka_3_25-Jul-2016-12a.json.gz` group by channelid ;
> > +------------+----------+
> > | channelid  |  EXPR$1  |
> > +------------+----------+
> > | 3          | 977134   |
> > | 0          | 836850   |
> > | 2          | 3202854  |
> > +------------+----------+
> > 3 rows selected (86.034 seconds)
> >
> >
> >
> > 0: jdbc:drill:> select channelid, count(serverTime) from
> dfs.`/tmp/stest-bz2/kafka_3_25-Jul-2016-12a.json.bz2` group by channelid ;
> > +------------+----------+
> > | channelid  |  EXPR$1  |
> > +------------+----------+
> > | 3          | 977134   |
> > | 0          | 836850   |
> > | 2          | 3202854  |
> > +------------+----------+
> > 3 rows selected (459.079 seconds)
> >
> >
> >
> > Questions:
> > 1. As per above Test: Gz is 6x fast than Bz2. why is that ?
> > 2. How can we speed to up Bz2.  Are there any configuration to do ?
> > 3. As bz2 is splittable format, How drill using it ?
> >
> >
> > regards,
> > shankar
>

Re: [Drill-Questions] Speed difference between GZ and BZ2

Reply via email to