Tim Armstrong has posted comments on this change. Change subject: IMPALA-3764,3914: fuzz test HDFS scanners and fix parquet bugs found ......................................................................
Patch Set 10: (5 comments) http://gerrit.cloudera.org:8080/#/c/3448/9/be/src/exec/hdfs-parquet-scanner.cc File be/src/exec/hdfs-parquet-scanner.cc: Line 649: FILE_CHECK_GE(col_reader->def_level(), > Doesn't have to be done now, but FILE_CHECK places would be nice if they ca I'm not sure I exactly understand. FILE_CHECKs are DCHECKs now so we should crash if the scanner actually exercises them. http://gerrit.cloudera.org:8080/#/c/3448/9/be/src/runtime/disk-io-mgr.cc File be/src/runtime/disk-io-mgr.cc: Line 497: if (range->offset_ < 0) { > this seems weird. makes it unclear if it's an invariant or not. I think we Yeah that is weird. Fixed it. http://gerrit.cloudera.org:8080/#/c/3448/9/be/src/runtime/scoped-buffer.h File be/src/runtime/scoped-buffer.h: Line 13: // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY > mind putting in the ASF license text instead? Done http://gerrit.cloudera.org:8080/#/c/3448/9/be/src/util/dict-encoding.h File be/src/util/dict-encoding.h: Line 171: if (UNLIKELY(buffer_len == 0)) return Status("Dictionary cannot be 0 bytes"); > who can call this with buffer_len == 0? why isn't != 0 an invariant while < My thought was that negative lengths are clearly nonsensical, but the fact that zero-length buffers aren't valid dictionaries seems closer to an implementation detail. Or at least it isn't obvious to someone working on the scanner code; calling SetData() with length = 0 doesn't obviously look wrong. So it seemed reasonable that we should validate it since we're validating the input here anyway. I can remove the check if you prefer since the callsite in parquet checks it anyway. http://gerrit.cloudera.org:8080/#/c/3448/9/tests/query_test/test_scanners_fuzz.py File tests/query_test/test_scanners_fuzz.py: Line 1: # Licensed to the Apache Software Foundation (ASF) under one > please put ASF license here rather than copyright. Done -- To view, visit http://gerrit.cloudera.org:8080/3448 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I50cf43195a7c582caa02c85ae400ea2256fa3a3b Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Tim Armstrong <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: David Knupp <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
