Sailesh Mukil has uploaded a new change for review. http://gerrit.cloudera.org:8080/2574
Change subject: [NOT-FOR-REVIEW]IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems ...................................................................... [NOT-FOR-REVIEW]IMPALA-2904: Support INSERT and LOAD DATA on S3 and between filesystems Previously Impala disallowed LOAD DATA on S3. This patch enables LOAD DATA on S3 without making major changes to improve performance for S3. This patch also enables LOAD DATA between file systems. Added a python S3 client called 'boto3' to access S3 from the python tests. A new class called S3Client is introduced which creates wrappers around the boto3 functions and have the same function signatures as PyWebHdfsClient by deriving from a base abstract class BaseFileSystem so that they can be interchangeably through a 'generic_client'. test_load.py is refactored to use this generic client. The ImpalaTestSuite setup creates a client according to the TARGET_FILESYSTEM environment variable and assigns it to the 'generic_client'. The other test files have not been changed yet. They will be changed in a future patch when INSERT support for S3 is introduced. Also, more abstract methods need to be introduced to have a common interface for both filesystem classes. The QueryTest/load-s3.test file is temporary because the load.test file has an INSERT query which is not currently supported. It will be removed once INSERT support for S3 is introduced. P.S: Currently, the test_load.py runs 15x slower on S3 than on HDFS (Even after removing one query for S3). Performance needs to be improved in future patches. TODO: Add Commit message for INSERT!! Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d --- M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M be/src/runtime/coordinator.cc M be/src/runtime/hdfs-fs-cache.h M be/src/util/hdfs-bulk-ops.cc M be/src/util/hdfs-bulk-ops.h M be/src/util/hdfs-util.cc M be/src/util/hdfs-util.h M common/thrift/ImpalaInternalService.thrift M fe/src/main/java/com/cloudera/impala/analysis/InsertStmt.java M fe/src/main/java/com/cloudera/impala/analysis/LoadDataStmt.java M fe/src/main/java/com/cloudera/impala/common/FileSystemUtil.java M fe/src/main/java/com/cloudera/impala/service/Frontend.java M infra/python/deps/requirements.txt M tests/common/impala_test_suite.py M tests/common/skip.py M tests/metadata/test_ddl.py M tests/metadata/test_load.py M tests/query_test/test_insert_behaviour.py M tests/query_test/test_insert_parquet.py A tests/util/filesystem_base.py M tests/util/hdfs_util.py A tests/util/s3_util.py 23 files changed, 473 insertions(+), 187 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala refs/changes/74/2574/1 -- To view, visit http://gerrit.cloudera.org:8080/2574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I94e15ad67752dce21c9b7c1dced6e114905a942d Gerrit-PatchSet: 1 Gerrit-Project: Impala Gerrit-Branch: cdh5-trunk Gerrit-Owner: Sailesh Mukil <[email protected]>
