Skye Wanderman-Milne has posted comments on this change.

Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data
......................................................................


Patch Set 10:

(22 comments)

http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 117:   if (!parse_status_.ok()) return false;
> why is this line needed?
Oops it's not, removed.


PS10, Line 134: (
> nit: remove extraneous parens
Done


Line 142:     } else if (type == TYPE_DOUBLE) {
> could save this cmp/branch by rewriting:
Done (here and elsewhere)


PS10, Line 154: (
> nit: remove parens
Done


PS10, Line 178: (
> remove parens
Done


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 75: }
> let's move this adjacent to the other constructor to make it easier to noti
Done


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

PS10, Line 214: :
> add data_end_val
Done


Line 240:   /// is exceeded when allocating the new char buffer. It returns 
true otherwise.
> document return value / parse_status_ side effects
Done


PS10, Line 260: .
> mention it updates parse_status_ on error
Done


PS10, Line 278: outside of xcompiled functions
> why is this important? is it to just limit the amount of IR or is there ano
Done


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

PS10, Line 41: max_len
> it's a little strange that the slow path and fast path will give different 
Done, addressed via template.


PS10, Line 60: __restrict__ 
> I don't think the restrict helps with buf_end, since we don't dereference i
Done


PS10, Line 61: MAX_ZLONG_LEN
> put a brief comment explaining why this is MAX_ZLONG_LEN rather than max_le
Done


Line 65: 
> Extra blank line.
Done


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 97:     ZResult() : error(true) { }
> Maybe make a static constructor like:
I changed the 'error' flag to 'ok' (I think this is polarity is more in line 
with our usual idioms, e.g. RETURN_IF_FALSE), and made a static error() 
constructor for brevity.


Line 99:   };
> maybe wrap this in "namespace internal" to help ensure the typedefs are use
I think you're not allowed to have a namespace inside of a class


PS10, Line 106: 'buf_len' is decremented
> update comment
Done


PS10, Line 109: max_len = MAX_ZLONG_LEN
> does any other callsites beside ReadZInt() want to set this parameter to so
I went with the two template params, I think it's slightly to return int32_t 
for ReadZInt().


Line 239:   if (r.error) return ZIntResult();
> rather than adding this cmp/br, we could do the template described above, o
Done


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/scanner-context.inline.h
File be/src/exec/scanner-context.inline.h:

Line 116: inline bool ScannerContext::Stream::ReadVLong(int64_t* value, Status* 
status) {
> are these "V" routines used by avro? do they need any fixing?
No, only in hdfs-sequence-scanner (which likely has its own bugs)


Line 156:   if (r.error) return false;
> don't we need to set *status on the r.error path?  would be good to add a t
Set *status. I still need to make test files but I'll post what I have for now.


Line 156:   if (r.error) return false;
> UNLIKELY?
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to