Dan Hecht has posted comments on this change.

Change subject: IMPALA-3441, IMPALA-3659: check for malformed Avro data
......................................................................


Patch Set 10:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner-ir.cc
File be/src/exec/hdfs-avro-scanner-ir.cc:

Line 117:   if (!parse_status_.ok()) return false;
why is this line needed?


PS10, Line 134: (
nit: remove extraneous parens


Line 142:     } else if (type == TYPE_DOUBLE) {
could save this cmp/branch by rewriting:

} else {
  DCHECK_EQ(type, TYPE_DOUBLE);
 ...
}

but okay to ignore in this patch


PS10, Line 154: (
nit: remove parens


PS10, Line 178: (
remove parens


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.cc
File be/src/exec/hdfs-avro-scanner.cc:

Line 75: }
let's move this adjacent to the other constructor to make it easier to notice 
if one needs updating the other should to.


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/hdfs-avro-scanner.h
File be/src/exec/hdfs-avro-scanner.h:

PS10, Line 214: :
add data_end_val


Line 240:   /// is exceeded when allocating the new char buffer. It returns 
true otherwise.
document return value / parse_status_ side effects


PS10, Line 260: .
mention it updates parse_status_ on error


PS10, Line 278: outside of xcompiled functions
why is this important? is it to just limit the amount of IR or is there another 
reason it has to be done in native code?  (I'm fine with this factoring, just 
want to clarify the comment).


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.cc
File be/src/exec/read-write-util.cc:

Line 28:   }
this could be unrolled, but if we're getting good perf as is, okay to leave 
alone for now.


Line 31:   if ((y & 0x8000) == 0) return 10;
i.e. like you've unrolled this already.


PS10, Line 61: MAX_ZLONG_LEN
put a brief comment explaining why this is MAX_ZLONG_LEN rather than max_len 
(it's to protect FindZLongLength() which we use even if it's a ZInt).
also remove extra parens.


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/read-write-util.h
File be/src/exec/read-write-util.h:

Line 99:   };
maybe wrap this in "namespace internal" to help ensure the typedefs are used 
instead.


PS10, Line 106: 'buf_len' is decremented
update comment


PS10, Line 109: max_len = MAX_ZLONG_LEN
does any other callsites beside ReadZInt() want to set this parameter to 
something different?  If not, we could save passing this parameter at every 
callsite (and also propagate the constant) by defining:

template<int MAX_LEN, typename ZResult> ZResult ReadZInteger(buf, buf_end) 
which is then wrapped by ReadZLong, ReadZInt. 

(or get rid of the second template arg by having a single result type ZResult 
which is <bool, int64_t> see below also).


Line 239:   if (r.error) return ZIntResult();
rather than adding this cmp/br, we could do the template described above, or 
have a constructor for ZIntResult that takes a ZLongResult.  Or we could just 
have a single result type: ZResult which is <bool, int64_t> which can work okay 
for both functions.


http://gerrit.cloudera.org:8080/#/c/3072/10/be/src/exec/scanner-context.inline.h
File be/src/exec/scanner-context.inline.h:

Line 116: inline bool ScannerContext::Stream::ReadVLong(int64_t* value, Status* 
status) {
are these "V" routines used by avro? do they need any fixing?


Line 156:   if (r.error) return false;
> UNLIKELY?
don't we need to set *status on the r.error path?  would be good to add a test 
case for this path.


-- 
To view, visit http://gerrit.cloudera.org:8080/3072
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I801a11c496a128e02c564c2a9c44baa5a97be132
Gerrit-PatchSet: 10
Gerrit-Project: Impala
Gerrit-Branch: cdh5-trunk
Gerrit-Owner: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Alex Behm <[email protected]>
Gerrit-Reviewer: Dan Hecht <[email protected]>
Gerrit-Reviewer: Skye Wanderman-Milne <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to