Michael Brown has posted comments on this change. Change subject: IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 5: (28 comments) http://gerrit.cloudera.org:8080/#/c/2574/5//COMMIT_MSG Commit Message: Line 19: 'generic_client'. test_load.py is refactored to use this generic Could we name this something that better indicates what it does? I'm up for alternatives, but one option is filesystem_client. An example for a better name: when using an ImpalaTestSuite, it'll be tough for someone to distinguish "self.client" from "self.generic_client": neither really say what they do. http://gerrit.cloudera.org:8080/#/c/2574/5/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: Line 48: boto3 == 1.2.3 Please continue sort top-level packages (sans ipython) alphabetically. http://gerrit.cloudera.org:8080/#/c/2574/5/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: Line 117: cls.s3_client = cls.create_s3_client() Do we need to create both clients in L117-118? Line 156: def create_s3_client(cls): It seems like indirection to have this method. Could this just be inlined into L117? http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: Line 149: "location '{1}/{0}/t3/'".format(DDL_TEST_DB, WAREHOUSE)) Nit: could you change the ordering of the format params (and format string) so the ordering corresponds? http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_explain.py File tests/metadata/test_explain.py: Line 106: self.client.execute("CREATE TABLE %s.empty_partition (col int) " \ : "partitioned by (p int)" % self.TEST_DB_NAME); 1. backslash is not needed 2. semicolon is not needed 3. for white spacing and indenting, either of these are acceptable: a. "partitioned by" is aligned with "CREATE TABLE" self.client.execute("CREATE TABLE %s.empty_partition (col int) " "partitioned by (p int)" % self.TEST_DB_NAME) b. argument starts new line after ( and is given two indents, not 1; subsequent lines are still vertically aligned self.client.execute( "CREATE TABLE %s.empty_partition (col int) partitioned by " "(p int)" % self.TEST_DB_NAME) http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_load.py File tests/metadata/test_load.py: Line 62: "{0}/{1}/100101.txt".format(STAGING_PATH, i)) This is another wrong alignment. See my comment in tests/metadata/test_explain.py for the options. http://gerrit.cloudera.org:8080/#/c/2574/5/tests/metadata/test_recover_partitions.py File tests/metadata/test_recover_partitions.py: Line 117: self.BASE_DIR + null_dir + file_path, null_inserted_value) needs another indent (2 spaces) http://gerrit.cloudera.org:8080/#/c/2574/5/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 48: self.client.execute(("INSERT OVERWRITE functional.%s" : " SELECT int_col FROM functional.tinyinttable" % TBL_NAME)) L50-51 from before were correctly formatted. This adds extraneous parentheses, and L49 is not correctly aligned. Line 70: for file_ in hidden_file_locations: : if not self.generic_client.exists(table_dir + file_): : err_msg = "Hidden file '%s' was unexpectedly deleted by INSERT OVERWRITE" : pytest.fail(err_msg % (table_dir + file_)) : : for dir_ in dir_locations: : if not self.generic_client.exists(table_dir + file_): : err_msg = "Directory '%s' was unexpectedly deleted by INSERT OVERWRITE" : pytest.fail(err_msg % (table_dir + dir_)) Instead of if blocks and explicit pytest.fail() calls, why not just asserts? Line 89: self.generic_client.delete_file_dir(PART_DIR, recursive=True) Is PART_DIR even in scope? It looks to be part_dir in L84. Line 360: does_exist = self.generic_client.exists(path) : if does_exist and not should_exist: : pytest.fail("file/dir '%s' unexpectedly exists" % path) : if should_exist and not does_exist: : pytest.fail("file/dir '%s' does not exist" % path) if -> fail vs. assert question applies here, too. http://gerrit.cloudera.org:8080/#/c/2574/5/tests/util/filesystem_base.py File tests/util/filesystem_base.py: Line 20: class BaseFilesystem(object): Please add some vertical white space to make this look a little less dense. Also, could you document the success/failure conditions? Are there return values to check? Exceptions to catch? Line 25: Create a file in 'path' and populate with 'file_data'. If overwrite is true, s/true/True/ ? Line 31: """Create a directory in 'path' with permissions 'permission'""" What's permission? A umask string, or umask octal, or something else? Line 38: def create_file(self, src, dst): There are two create_file methods. Line 47: """Check if a particular path exists""" ...and does what? http://gerrit.cloudera.org:8080/#/c/2574/5/tests/util/s3_util.py File tests/util/s3_util.py: Line 30: if overwrite is False: if not overwrite Line 40: return pass? Line 43: self.s3client.copy_object(Bucket=self.bucketname, : CopySource={'Bucket':self.bucketname, 'Key':src}, Key=dst) L44 is misaligned Line 49: response = self.s3client.list_objects(Bucket=self.bucketname,Prefix=path) space after "bucketname," Line 58: if not path[len(path) - 1] == '/': : path = path + '/' See str.endswith() . I've also seen patterns like this, which I can never decide if I like: path = path.strip('/') + '/' Line 61: Bucket=self.bucketname,Prefix=path,Delimiter='/') space after "bucketname,". Also, the line needs another indent level Line 63: if response.get('CommonPrefixes'): Some inline comments explaining what the relevant intermediate data structures look like would really help in this method. Line 82: if contents is None: : return False : return True return contents is not None ? Line 90: if recursive is True: if recursive: Line 93: objects += [{'Key':k}] objects.append({'Key': k}) Line 96: self.s3client.delete_objects(Bucket=self.bucketname,Delete={'Objects':objects}) space after "bucketname," -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 5 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-HasComments: Yes
