Tim Armstrong has posted comments on this change. Change subject: IMPALA-1619: Support 64-bit allocations. ......................................................................
Patch Set 5: (11 comments) Can we also add a test to udf-test that exercises the corner cases: zero-length and negative allocations? http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/exec/delimited-text-parser.h File be/src/exec/delimited-text-parser.h: Line 120: Status FillColumns(int64_t len, char** last_column, int* num_fields, There are a couple of places in DelimitedTextParser::ParseFieldLocations() where we're not checking the return status of this function. http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/exec/hdfs-text-scanner.cc File be/src/exec/hdfs-text-scanner.cc: Line 757: if (state_->abort_on_error()) { We should probably implement this abort_on_error=false behaviour where we just skip the row for the long string case. The multi-GB string is kind of a corner case and we're handling it much better now, but it would be good to be consistent in error handling. http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/exec/scanner-context.cc File be/src/exec/scanner-context.cc: Line 234: stream Stream? Buffer, no? http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/runtime/buffered-block-mgr.cc File be/src/runtime/buffered-block-mgr.cc: Line 306: int64_t buffers_needed = BitUtil::Ceil(size, max_block_size()); We should just check here that buffers_needed fits in a 32-bit int, rather than propagating the type change. http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/runtime/buffered-block-mgr.h File be/src/runtime/buffered-block-mgr.h: Line 342: bool TryAcquireTmpReservation(Client* client, int64_t num_buffers); We shouldn't need to change this, this is denominated in 8MB buffers. I think the only change we really need is to add a check in ConsumeMemory that the number of buffers fits in a 32-bit int. http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/runtime/free-pool.h File be/src/runtime/free-pool.h: Line 68: if (UNLIKELY(size == 0)) return mem_pool_->EmptyAllocPtr(); Should we be checking for < 0 like in the MemPool? I couldn't see any handling of that case if a rogue UDF called Allocate(-1). http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/runtime/string-buffer.h File be/src/runtime/string-buffer.h: Line 77: /// the buffer will exceed memory limit. Mention that this doesn't free the previous buffer? http://gerrit.cloudera.org:8080/#/c/2781/5/be/src/util/codec.cc File be/src/util/codec.cc: Line 188: (output_len64 & ~((1LL << 31) - 1)) != 0) What the benefit of this compared to the straightforward comparison? Because it handles the negative case too? Can we factor this check out into something in BitUtil? http://gerrit.cloudera.org:8080/#/c/2781/5/common/thrift/generate_error_codes.py File common/thrift/generate_error_codes.py: Line 242: length supported should also include the maximum length in the error message. E.g. Length of column $0 exceeds maximum supported length of $1 http://gerrit.cloudera.org:8080/#/c/2781/5/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: Line 45: python-snappy == 0.5.0 We currently add the egg/tarball/whatever for the dependencies here (rather than having download_requirements hammer third-party servers all the time) http://gerrit.cloudera.org:8080/#/c/2781/5/tests/query_test/test_compressed_formats.py File tests/query_test/test_compressed_formats.py: Line 1: # Copyright (c) 2012 Cloudera, Inc. All rights reserved. Can we also extend the test_large_strings test to exercise some of the cases that now work? -- To view, visit http://gerrit.cloudera.org:8080/2781 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7ed28083d809a86d801a9c063a0aa32c50d32b20 Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Michael Ho <[email protected]> Gerrit-Reviewer: Dan Hecht <[email protected]> Gerrit-Reviewer: Michael Ho <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-HasComments: Yes
