Hi Vishal, You are right, thats why we can do no-dictionary only for String datatype. Please look at my first point. we can always use direct dictionary for possible data types like short, int, long, double & float for sort_columns.
Regards, Ravindra. On 1 March 2017 at 18:18, Kumar Vishal <kumarvishal1...@gmail.com> wrote: > Hi Ravi, > Sorting of data for no dictionary should be based on data type + same for > filter . Please add this point. > > -Regards > Kumar Vishal > > On Wed, Mar 1, 2017 at 8:34 PM, Ravindra Pesala <ravi.pes...@gmail.com> > wrote: > > > Hi, > > > > In order to make non-dictionary columns storage and performance more > > efficient, I am suggesting following improvements. > > > > 1. Make always SHORT, INT, BIGINT, DOUBLE & FLOAT always direct > > dictionary. > > Right now only date and timestamp are direct dictionary columns. We > can > > make SHORT, INT, BIGINT, DOUBLE & FLOAT Direct dictionary if these > columns > > are included in SORT_COLUMNS > > > > 2. Consider delta/value compression while storing direct dictionary > values. > > Right now it always uses INT datatype to store direct dictionary values. > So > > we can consider value/Delta compression to compact the storage. > > > > 3. Use the Separator instead of LV format to store String value in > > no-dictionary format. > > Currently String datatypes for non-dictionary colums are stored as > > LV(length value) format, here we are using Short(2 bytes) as length > always. > > In order to keep storage compact we can use separator (0 byte as > separator) > > it just takes single byte. And while reading we can traverse through data > > and get the offsets like we are doing now. > > > > 4. Add Range filters for no-dictionary columns. > > Currently range filters like greater/ less than filters are not > implemented > > for no-dictionary columns. So we should implement them to avoid row level > > filter and improve the performance. > > > > Regards, > > Ravindra. > > > -- Thanks & Regards, Ravi