Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3579: Strict handling of numeric overflow in text parsing ......................................................................
Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/3150/1//COMMIT_MSG Commit Message: Line 13: invalid data, i.e. NULL is returned. > this doesn't seem right. when the option is set, error is returned (not NU In the case that abort_on_error=false but strict_mode=true this is what we see, no? Line 21: However, integer and floating point parsing never returns : an underflow today. > what cases would be integer and floating point underflow? For integer, we Yeah for floating point we don't handle underflows. For integers I'm talking about truncation so I guess I shouldn't refer to that as underflow since it's not really what it means (though it is similar, i.e. not enough precision). http://gerrit.cloudera.org:8080/#/c/3150/1/be/src/exec/text-converter.cc File be/src/exec/text-converter.cc: Line 249: (though the latter are only possible for : // DECIMAL which isn't codegen'd) > is underflow only possible given the way the code works today or by definit I thought we wanted to handle underflows in floats/doubles. Right now we don't really handle them, i.e. 9.9e-99999 is 0 w/ PARSE_SUCCESS. I also think we may want to consider handling truncating for integers (though I dunno if we'd want to call that underflow necessarily, perhaps like PARSE_TRUNCATE). Anyway, that's not happening here so I'll take this out for now. http://gerrit.cloudera.org:8080/#/c/3150/1/be/src/exec/text-converter.inline.h File be/src/exec/text-converter.inline.h: Line 171: // TODO: add warnings > what is this TODO about at this point? shouldn't it be the caller would do Was thinking in case we wanted to log something more specific, but I guess that's not really useful (at least not yet). I'll remove. http://gerrit.cloudera.org:8080/#/c/3150/1/common/thrift/ImpalaInternalService.thrift File common/thrift/ImpalaInternalService.thrift: Line 200: // Additional strict handling of invalid values in parsing. > I think we'll want this option to have broader meaning soon enough, so mayb Done http://gerrit.cloudera.org:8080/#/c/3150/1/testdata/workloads/functional-query/queries/QueryTest/strict-mode-abort.test File testdata/workloads/functional-query/queries/QueryTest/strict-mode-abort.test: Line 32 > how about testing negative values for all these cases as well Hm yeah, I think the best way to do that is introducing a new "bad data" file. However, I'm not as worried about handling both cases here because strict-mode.test (the other test file) tests that all values (both positive and negative) return parse errors, so this is more of a test that the abort_on_error code works on this path. I'm not sure it's worth adding a new bad data file (testdata/data/...) and then a new functional table to wrap it. What do you think? -- To view, visit http://gerrit.cloudera.org:8080/3150 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7409c31ec0cb6fe0b2d9842b9f58fe1670914836 Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-2.6.0_5.8.0 Gerrit-Owner: Matthew Jacobs <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Matthew Jacobs <[email protected]> Gerrit-HasComments: Yes
