[jira] [Created] (CARBONDATA-3006) Carbon Store Size Optimization and Scan Query Performance Improvement

kumar vishal (JIRA) Mon, 15 Oct 2018 01:36:53 -0700

kumar vishal created CARBONDATA-3006:
----------------------------------------


             Summary: Carbon Store Size Optimization and Scan Query Performance 
Improvement
                 Key: CARBONDATA-3006
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-3006
             Project: CarbonData
          Issue Type: Improvement
            Reporter: kumar vishal


*String/Varchar Datatype Store Size Optimization:*
Currently length is stored as Short/Int for String/varchar datatype because of 
this store size is more. To reduce the store size Adaptive encoding is applied 
for length part irrespective of String/Varchar type so during processing there 
will not be separate handling for String/Varchar datatype.

*String/Varchar datatype query processing optimization:*
Currently for processing the String/Varchar datatype during query 
offset(positions of data) is calculated and based on position data is fetched. 
Because of this many cacheline misses is happening and its degrading query 
performance.
To handle this for full scan query with no inverted index, data is fetched is 
in linear way to avoid cache line misses.

*Adaptive encoding for Global/Direct/Local dictionary columns*
Currently Global/Direct/Local dictionary are stored in binary format and only 
snappy is applied for compression. As Global/Direct/Local dictionary values are 
of Integer data type it can adaptability stored with the data type smaller than 
Integer.
Added adaptive for global/direct dictionary column to reduce the store size.

*Method In-lining Optimization*
JIT will inline any method if method size is less than 325 byte code size and 
if it is called more than 10K times(default value). If method is private or 
static it will be easier for JIT to inline as type safe check is not required, 
for protected/public method it will add a overhead of type check and because of 
this it will not behave as inline.
Because of above case some refactoring is done for primitive no dictionary data 
type columns. Earlier ColumnPageWrapper.java was handling query processing for 
all primitive no dictionary data type column now in This PR separate classes 
are created for each data type handling and all the HOT method is kept as 
private and protected methods are overridden and other methods are added in 
Super classes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Created] (CARBONDATA-3006) Carbon Store Size Optimization and Scan Query Performance Improvement

Reply via email to