[
https://issues.apache.org/jira/browse/IMPALA-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16816662#comment-16816662
]
ASF subversion and git services commented on IMPALA-6050:
---------------------------------------------------------
Commit a103cb8ee2357c220eaf912d9aefd522b09f3e04 in impala's branch
refs/heads/master from stakiar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=a103cb8 ]
IMPALA-6050: Query profiles should indicate storage layer(s) used
This patch updates Impala explain plans so that the Scan Node section clearly
displays which filesystems the Scan Node is reading data from (support
has been added for scans from HDFS, S3, ADLS, and the local filesystem).
Before this patch, if an Impala query scanned a table with partitions
across different storage layers, the explain plan would look like this:
PLAN-ROOT SINK
|
01:EXCHANGE [UNPARTITIONED]
|
00:SCAN HDFS [functional.alltypes]
partitions=24/24 files=24 size=478.45KB
Now the explain plan will look like this:
PLAN-ROOT SINK
|
01:EXCHANGE [UNPARTITIONED]
|
00:SCAN S3 [functional.alltypes]
ADLS partitions=4/24 files=4 size=478.45KB
HDFS partitions=10/24 files=10 size=478.45KB
S3 partitions=10/24 files=10 size=478.45KB
The explain plan differentiates "SCAN HDFS" vs "SCAN S3" by using the
root table path. This means that even scans of non-partitioned tables
will see their explain plans change from "SCAN HDFS" to "SCAN
[storage-layer-name]". This change affects explain plans that are stored on
an single storage layer as well: 'partitions=...' will become
'HDFS partitions-...'.
This patch makes several changes to PlannerTest.java so that by default
test files do not validate the value of the storage layer displayed in
the explain plan. This is necessary to support classes such as
S3PlannerTest which run test files against S3. It makes several changes
to impala_test_suite.py as well in order to support validation of
explain plans in test files that run via Python. Specifically, it adds
support for a new substitution variable in test files called
$FILESYSTEM_NAME which is the name of the storage layer the test is
being run against.
Testing:
* Ran core tests
* Added new tests to PlannerTest
* Added ExplainTest to allow for more fine-grained testing of explain
plan logic
Change-Id: I4b1b4a1bc1a24e9614e3b4dc5a61dc96d075d1c3
Reviewed-on: http://gerrit.cloudera.org:8080/12282
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Query profiles should clearly indicate storage layer(s) used
> ------------------------------------------------------------
>
> Key: IMPALA-6050
> URL: https://issues.apache.org/jira/browse/IMPALA-6050
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Sailesh Mukil
> Assignee: Sahil Takiar
> Priority: Major
> Labels: adls, profile, s3, supportability
>
> Currently, the query profile doesn't have the location of tables and
> partitions, which makes it hard to figure out what storage layer a
> table/partition that was queried was on.
> As we're seeing more users run Impala workloads against cloud based storage
> like S3 and ADLS, we should have the query profiles show this information.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]