Skye Wanderman-Milne has posted comments on this change. Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data ......................................................................
Patch Set 10: (22 comments) http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner-ir.cc File be/src/exec/hdfs-avro-scanner-ir.cc: Line 117: if (!parse_status_.ok()) return false; > why is this line needed? Oops it's not, removed. PS10, Line 134: ( > nit: remove extraneous parens Done Line 142: } else if (type == TYPE_DOUBLE) { > could save this cmp/branch by rewriting: Done (here and elsewhere) PS10, Line 154: ( > nit: remove parens Done PS10, Line 178: ( > remove parens Done http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.cc File be/src/exec/hdfs-avro-scanner.cc: Line 75: } > let's move this adjacent to the other constructor to make it easier to noti Done http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.h File be/src/exec/hdfs-avro-scanner.h: PS10, Line 214: : > add data_end_val Done Line 240: /// is exceeded when allocating the new char buffer. It returns true otherwise. > document return value / parse_status_ side effects Done PS10, Line 260: . > mention it updates parse_status_ on error Done PS10, Line 278: outside of xcompiled functions > why is this important? is it to just limit the amount of IR or is there ano Done http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.cc File be/src/exec/read-write-util.cc: PS10, Line 41: max_len > it's a little strange that the slow path and fast path will give different Done, addressed via template. PS10, Line 60: __restrict__ > I don't think the restrict helps with buf_end, since we don't dereference i Done PS10, Line 61: MAX_ZLONG_LEN > put a brief comment explaining why this is MAX_ZLONG_LEN rather than max_le Done Line 65: > Extra blank line. Done http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.h File be/src/exec/read-write-util.h: Line 97: ZResult() : error(true) { } > Maybe make a static constructor like: I changed the 'error' flag to 'ok' (I think this is polarity is more in line with our usual idioms, e.g. RETURN_IF_FALSE), and made a static error() constructor for brevity. Line 99: }; > maybe wrap this in "namespace internal" to help ensure the typedefs are use I think you're not allowed to have a namespace inside of a class PS10, Line 106: 'buf_len' is decremented > update comment Done PS10, Line 109: max_len = MAX_ZLONG_LEN > does any other callsites beside ReadZInt() want to set this parameter to so I went with the two template params, I think it's slightly to return int32_t for ReadZInt(). Line 239: if (r.error) return ZIntResult(); > rather than adding this cmp/br, we could do the template described above, o Done http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/scanner-context.inline.h File be/src/exec/scanner-context.inline.h: Line 116: inline bool ScannerContext::Stream::ReadVLong(int64_t* value, Status* status) { > are these "V" routines used by avro? do they need any fixing? No, only in hdfs-sequence-scanner (which likely has its own bugs) Line 156: if (r.error) return false; > don't we need to set *status on the r.error path? would be good to add a t Set *status. I still need to make test files but I'll post what I have for now. Line 156: if (r.error) return false; > UNLIKELY? Done -- To view, visit http://gerrit.cloudera.org:8080/3072 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132 Gerrit-PatchSet: 10 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Alex Behm <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
