+1 for strict check mechanism. User should see a consistent behavior for both dimension and measure columns.
Best Regards, Aniket On Tue, Dec 6, 2016 at 6:03 PM, Liang Chen <[email protected]> wrote: > Hi > > Thank you started a good discussion. > > I propose to do strict check mechanism to avoid these problems what you > mentioned in the below. > And the behavior should be same for both dimensions and measures. In a word > , need to process the actual data type as per users input. > > Regards > Liang > > > manishgupta88 wrote > > Hi All, > > > > Currently in carbon we treat Short and Int as long and at the time of > > storing in carbon data files delta compression is used which compresses > > the > > data based on min and max values of the column. > > > > While parsing the values for these datatypes, we use Double data type > > parser and extract long value from that. Code snippet as below. > > Double.valueOf(msrValue).longValue() > > > > This has the following problems. > > > > 1. Measure Values beyond the range of Int and Short are parsed > > successfully. This behavior conflicts when the same measure is included > as > > dictionary_include and becomes a dimension. When we query then each > > dimension value is parsed for its datatype for result conversion and at > > that time NumberFormatException is thrown and null is displayed in the > > result while for measure the loaded values are displayed. This also > > impacts > > aggregate queries. That is why strict check mechanism is adopted for > > dimensions values parsing. > > > > 2. Data inconsistency in case of measures as for decimal values, the > > value > > before decimal will only be considered for Int and Short datatypes. > > > > 3. For measures, if values beyond the datatype range are allowed the > > compression will decrease. > > > > Please comment as what should be the parsing behavior. Carbon should > adopt > > a strict check mechanism or lenient check mechanism considering that the > > behavior should be same for both dimensions and measures as both are > > finally table columns. > > > > Regards > > Manish Gupta > > > > > > -- > View this message in context: http://apache-carbondata- > mailing-list-archive.1130556.n5.nabble.com/Discussion- > Parsing-values-during-data-load-should-adopt-a-strict- > check-or-lenient-check-mechanism-tp3826p3893.html > Sent from the Apache CarbonData Mailing List archive mailing list archive > at Nabble.com. >
