GitHub user helifu opened a pull request:
https://github.com/apache/incubator-impala/pull/6
Branch 2.10.0
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/helifu/incubator-impala branch-2.10.0
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-impala/pull/6.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #6
----
commit 8049f811379c6f316520934fa7c495a4fc54d45d
Author: Taras Bobrovytsky <[email protected]>
Date: 2017-06-01T00:53:01Z
Update VERSION to 2.9.0 to begin release candidate testing
Change-Id: I88b03479ae1d73afc9e3f5883ee09ae2f9bcfe09
commit 4086f2c84de754d0a4a0ea87c0ee49b7e6eb469f
Author: Sailesh Mukil <[email protected]>
Date: 2017-04-11T00:08:01Z
IMPALA-5333: Add support for Impala to work with ADLS
This patch leverages the AdlFileSystem in Hadoop to allow
Impala to talk to the Azure Data Lake Store. This patch has
functional changes as well as adds test infrastructure for
testing Impala over ADLS.
We do not support ACLs on ADLS since the Hadoop ADLS
connector does not integrate ADLS ACLs with Hadoop users/groups.
For testing, we use the azure-data-lake-store-python client
from Microsoft. This client seems to have some consistency
issues. For example, a drop table through Impala will delete
the files in ADLS, however, listing that directory through
the python client immediately after the drop, will still show
the files. This behavior is unexpected since ADLS claims to be
strongly consistent. Some tests have been skipped due to this
limitation with the tag SkipIfADLS.slow_client. Tracked by
IMPALA-5335.
The azure-data-lake-store-python client also only works on CentOS 6.6
and over, so the python dependencies for Azure will not be downloaded
when the TARGET_FILESYSTEM is not "adls". While running ADLS tests,
the expectation will be that it runs on a machine that is at least
running CentOS 6.6.
Note: This is only a test limitation, not a functional one. Clusters
with older OSes like CentOS 6.4 will still work with ADLS.
Added another dependency to bootstrap_build.sh for the ADLS Python
client.
Testing: Ran core tests with and without TARGET_FILESYSTEM as
'adls' to make sure that all tests pass and that nothing breaks.
Change-Id: Ic56b9988b32a330443f24c44f9cb2c80842f7542
Reviewed-on: http://gerrit.cloudera.org:8080/6910
Tested-by: Impala Public Jenkins
Reviewed-by: Sailesh Mukil <[email protected]>
commit 2ffc86a5b218035cc42fa220f4d33a92b29d3fa6
Author: Sailesh Mukil <[email protected]>
Date: 2017-05-26T00:58:33Z
IMPALA-5375: Builds on CentOS 6.4 failing with broken python dependencies
Builds on CentOS 6.4 fail due to dependencies not met for the new
'cryptography' python package.
The ADLS commit states that the new packages are only required for ADLS
and that ADLS on a dev environment is only supported from CentOS 6.7.
This patch moves the compiled requirements for ADLS from
compiled-requirements.txt to adls-requirements.txt and passing a
compiler to the Pip environment while installing the ADLS
requirements.
Testing: Tested it on a machine that with TARGET_FILESYSTEM='adls'
and also tested it on a CentOS 6.4 machine with the default
configuration.
Change-Id: I7d456a861a85edfcad55236aa8b0dbac2ff6fc78
Reviewed-on: http://gerrit.cloudera.org:8080/6998
Reviewed-by: Tim Armstrong <[email protected]>
Tested-by: Impala Public Jenkins
commit 117fc388bff2a754be081eae7667627f84f1b33c
Author: Sailesh Mukil <[email protected]>
Date: 2017-05-30T18:56:43Z
IMPALA-5383: Fix PARQUET_FILE_SIZE option for ADLS
PARQUET_FILE_SIZE query option doesn't work with ADLS because the
AdlFileSystem doesn't have a notion of block sizes. And impala depends
on the filesystem remembering the block size which is then used as the
target parquet file size (this is done for Hdfs so that the parquet file
size and block size match even if the parquet_file_size isn't a valid
blocksize).
We special case for Adls just like we do for S3 to bypass the
FileSystem block size, and instead just use the requested
PARQUET_FILE_SIZE as the output partitions block_size (and consequently
the parquet file target size).
Testing: Re-enabled test_insert_parquet_verify_size() for ADLS.
Also fixed a miscellaneous bug with the ADLS client listing helper function.
Change-Id: I474a913b0ff9b2709f397702b58cb1c74251c25b
Reviewed-on: http://gerrit.cloudera.org:8080/7018
Reviewed-by: Sailesh Mukil <[email protected]>
Tested-by: Impala Public Jenkins
commit b8558506957dbf44b8ceb29c8b7382bfd8180e05
Author: Sailesh Mukil <[email protected]>
Date: 2017-05-30T19:50:13Z
IMPALA-5378: Disk IO manager needs to understand ADLS
The Disk IO Manager had customized support for S3 and remote HDFS that
allows for these to use a separate queue and have a customized number
of IO threads. ADLS did not have this support.
Based on the code in DiskIoMgr::Init and DiskIoMgr::AssignQueue, IOs
for ADLS were previously put in local disk queues. Since local disks
are considered rotational unless we can confirm otherwise by looking at
the /sys filesystem, this means that THREADS_PER_ROTATIONAL_DISK=1 was
being applied as the thread count.
This patch adds customized support for ADLS, similar to how it was done
for S3. We set 16 threads as the default number of IO threads for ADLS.
For smaller clusters, setting a higher number like 64 would work better.
We keep the thread count to a lower default of 16 since there is an
undocumented concurrency limit for clusters, which is around 500-700
connections, which means we would hurt node level parallelism if we
have higher thread level parallelism, for larger clusters.
We also set the default maximum chunk size for ADLS as 128k. This is due
to the fact that direct reads aren't supported for ADLS, which means that
the JNI array allocation and the memcpy adds significant overhead for
larger buffers. 128k was chosen emperically for S3 for the same reason.
Since this reason also holds for ADLS, we keep the same value. A new
flag called FLAGS_adls_read_chunk_size is used to control this value.
TODO: Settle on a buffer size with the most optimal buffer size
emperically.
Change-Id: I067f053fec941e3631610c5cc89a384f257ba906
Reviewed-on: http://gerrit.cloudera.org:8080/7033
Reviewed-by: Sailesh Mukil <[email protected]>
Tested-by: Impala Public Jenkins
commit 5141a10ee1f71945dd5d15000796b7cf717e7928
Author: hzhelifu <[email protected]>
Date: 2017-08-28T07:05:37Z
support runtimefilter for kudu.
commit b966906a442c4da9b039c70d96aaee9e39ee37dd
Author: hzhelifu <[email protected]>
Date: 2017-09-20T08:24:29Z
æµè¯éè¿ã
commit d0f6041997e1cd91162a527dc5d4c1f1ca526bb6
Author: hzhelifu <[email protected]>
Date: 2017-09-25T06:19:19Z
wait for runtimefilter.
commit 2e8fe3b33081cdcea05cda1827360a4154e913a0
Author: hzhelifu <[email protected]>
Date: 2017-09-28T05:29:39Z
ä¿®æ¹å®æï¼ä½æ¯æ§è½ä¸è¡ã
----
---