[jira] [Resolved] (CARBONDATA-542) Parsing values for measures and dimensions during data load should adopt a strict check

Ravindra Pesala (JIRA) Tue, 17 Jan 2017 11:55:53 -0800

     [ 
https://issues.apache.org/jira/browse/CARBONDATA-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ravindra Pesala resolved CARBONDATA-542.
----------------------------------------

> Parsing values for measures and dimensions during data load should adopt a 
> strict check
> ---------------------------------------------------------------------------------------
>
>                 Key: CARBONDATA-542
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-542
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Manish Gupta
>            Priority: Minor
>             Fix For: 1.0.0-incubating
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently in carbon we treat Short and Int as long and at the time of storing 
> in carbon data files delta compression is used which compresses the data 
> based on min and max values of the column.
> While parsing the values for these datatypes, we use Double data type parser 
> and extract long value from that. Code snippet as below. 
> Double.valueOf(msrValue).longValue()
> This has the following problems.
> 1. Measure Values beyond the range of Int and Short are parsed successfully. 
> This behavior conflicts when the same measure is included as 
> dictionary_include and becomes a dimension. When we query then each dimension 
> value is parsed for its datatype for result conversion and at that time 
> NumberFormatException is thrown and null is displayed in the result while for 
> measure the loaded values are displayed. This also impacts aggregate queries. 
> That is why strict check mechanism is adopted for dimensions values parsing.
> 2. Data inconsistency  in case of measures as for decimal values, the value 
> before decimal will only be considered for Int and Short datatypes.
> 3. For measures, if values beyond the datatype range are allowed the 
> compression will decrease.
> Therefore we will have to adopt a strict behavior for both dimensions and 
> measures.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (CARBONDATA-542) Parsing values for measures and dimensions during data load should adopt a strict check

Reply via email to