Hi Vishal,

You are right, thats why we can do no-dictionary only for String datatype.
Please look at my first point. we can always use direct dictionary for
possible data types like short, int, long, double & float for sort_columns.

Regards,
Ravindra.

On 1 March 2017 at 18:18, Kumar Vishal <kumarvishal1...@gmail.com> wrote:

> Hi Ravi,
> Sorting of data for no dictionary should be based on data type + same for
> filter . Please add this point.
>
> -Regards
> Kumar Vishal
>
> On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <ravi.pes...@gmail.com>
> wrote:
>
> > Hi,
> >
> > In order to make non-dictionary columns storage and performance more
> > efficient, I am suggesting following improvements.
> >
> > 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always  direct
> > dictionary.
> >    Right now only date and timestamp are direct dictionary columns. We
> can
> > make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these
> columns
> > are included in SORT_COLUMNS
> >
> > 2. Consider delta/value compression while storing direct dictionary
> values.
> > Right now it always uses INT datatype to store direct dictionary values.
> So
> > we can consider value/Delta compression to compact the storage.
> >
> > 3. Use the Separator instead of LV format to store String value in
> > no-dictionary format.
> > Currently String datatypes for non-dictionary colums are stored as
> > LV(length value) format, here we are using Short(2 bytes) as length
> always.
> > In order to keep storage compact we can use separator (0 byte as
> separator)
> > it just takes single byte. And while reading we can traverse through data
> > and get the offsets like we are doing now.
> >
> > 4. Add Range filters for no-dictionary columns.
> > Currently range filters like greater/ less than filters are not
> implemented
> > for no-dictionary columns. So we should implement them to avoid row level
> > filter and improve the performance.
> >
> > Regards,
> > Ravindra.
> >
>



-- 
Thanks & Regards,
Ravi

Reply via email to