[
https://issues.apache.org/jira/browse/CARBONDATA-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ravindra Pesala resolved CARBONDATA-542.
----------------------------------------
> Parsing values for measures and dimensions during data load should adopt a
> strict check
> ---------------------------------------------------------------------------------------
>
> Key: CARBONDATA-542
> URL: https://issues.apache.org/jira/browse/CARBONDATA-542
> Project: CarbonData
> Issue Type: Improvement
> Reporter: Manish Gupta
> Priority: Minor
> Fix For: 1.0.0-incubating
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently in carbon we treat Short and Int as long and at the time of storing
> in carbon data files delta compression is used which compresses the data
> based on min and max values of the column.
> While parsing the values for these datatypes, we use Double data type parser
> and extract long value from that. Code snippet as below.
> Double.valueOf(msrValue).longValue()
> This has the following problems.
> 1. Measure Values beyond the range of Int and Short are parsed successfully.
> This behavior conflicts when the same measure is included as
> dictionary_include and becomes a dimension. When we query then each dimension
> value is parsed for its datatype for result conversion and at that time
> NumberFormatException is thrown and null is displayed in the result while for
> measure the loaded values are displayed. This also impacts aggregate queries.
> That is why strict check mechanism is adopted for dimensions values parsing.
> 2. Data inconsistency in case of measures as for decimal values, the value
> before decimal will only be considered for Int and Short datatypes.
> 3. For measures, if values beyond the datatype range are allowed the
> compression will decrease.
> Therefore we will have to adopt a strict behavior for both dimensions and
> measures.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)