Sailesh Mukil has posted comments on this change. Change subject: IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems ......................................................................
Patch Set 6: (19 comments) http://gerrit.cloudera.org:8080/#/c/2574/6/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: Line 118: if IS_S3: > Much shorter and clearer: Done http://gerrit.cloudera.org:8080/#/c/2574/6/tests/query_test/test_insert_behaviour.py File tests/query_test/test_insert_behaviour.py: Line 110: ls = self.filesystem_client.ls(part_dir) > I don't think we need to define the ls variable here, just assert directly Done Line 358: does_exist = self.filesystem_client.exists(path) > how about: Done Line 398: assert new_ls == ls > assert ls == new_ls == 1 But we assert the len(new_ls) above. So, assert ls == new_ls == 1 would fail. http://gerrit.cloudera.org:8080/#/c/2574/6/tests/util/filesystem_base.py File tests/util/filesystem_base.py: Line 25: file_data > is file_data a string? Yes, I've updated to explicitly call it a string. http://gerrit.cloudera.org:8080/#/c/2574/6/tests/util/hdfs_util.py File tests/util/hdfs_util.py: Line 59: def make_dir(self, path, permission=755): > Do we need to override this? Is the only purpose to set the default permiss I meant to take this out. 755 is default for the parent too. So, this doesn't change anything. Done. Line 102: def ls(self, path): > Can you add a more detailed comment? What is the difference between this me We want ls() to return only the file names instead of a dictionary of file statuses. list_dir() returns the latter, which we don't need anywhere in our tests save for one or 2 cases. Done. http://gerrit.cloudera.org:8080/#/c/2574/6/tests/util/s3_util.py File tests/util/s3_util.py: Line 15: # S3 access utilities > This comment does not really add anything. Done Line 22: @classmethod > Add an empty line above this line. Done Line 30: if not overwrite: > if not override and self.exists(filename): return False Done Line 33: response > What's the point of this variable, it just disappears? I think we should re Done Line 37: othav > add space Done Line 57: = > path += '/' Done Line 69: both > This should be called files_and_dirs Done Line 70: for d in dirs: > How about something like: Done Line 75: key = tmp[-1] > how about simply: Done Line 83: return contents is not None > how about return response.get('Contents') is not None Done Line 90: keys = self.list_keys(path) > objects = [{'Key': k} for k in self.list_keys(path)] if recursive else path Done Line 92: objects.append({'Key':k}) > Do we have to add path to the list as well in the recursive case? No, the 'path' itself will be one of the keys returned by list_keys(). -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 6 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Henry Robinson <[email protected]> Gerrit-Reviewer: Michael Brown <[email protected]> Gerrit-Reviewer: Mostafa Mokhtar <[email protected]> Gerrit-Reviewer: Sailesh Mukil <[email protected]> Gerrit-Reviewer: Taras Bobrovytsky <[email protected]> Gerrit-HasComments: Yes
