[
https://issues.apache.org/jira/browse/IMPALA-7367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761493#comment-16761493
]
ASF subversion and git services commented on IMPALA-7367:
---------------------------------------------------------
Commit ae96a9fb19e0a2e0a5529f2f36d3b5ee0d336f69 in impala's branch
refs/heads/master from poojanilangekar
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=ae96a9f ]
IMPALA-8151: Use sizeof() in HiveUdfCall to specify non-primitive type's size
Previously, data type sizes were hardcoded in
HiveUdfCall::Evaluate(). Since IMPALA-7367 removed the padding
from STRING and VARCHAR types, it could read past the end of the
actual value and cause a crash. This change replaces the hardcoded
values with sizeof() calls to determine the size of non-primitive
types (STRING, VARCHAR and TIMESTAMP) to avoid similar issues in
the future.
Testing:
Ran test_udfs.py on an ASAN build.
Added logs to manually verify the size of bytes copied.
Change-Id: I919c330546fa86b474ab66245b20ceb1f5525b41
Reviewed-on: http://gerrit.cloudera.org:8080/12355
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Pack StringValue, CollectionValue and TimestampValue slots
> ----------------------------------------------------------
>
> Key: IMPALA-7367
> URL: https://issues.apache.org/jira/browse/IMPALA-7367
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Reporter: Tim Armstrong
> Assignee: Pooja Nilangekar
> Priority: Major
> Labels: perfomance
> Fix For: Impala 3.2.0
>
> Attachments: 0001-WIP.patch
>
>
> This is a follow-on to finish up the work from IMPALA-2789. IMPALA-2789
> didn't actually fully pack the memory layout because StringValue,
> TimestampValue and CollectionValue still occupy 16 bytes but only have 12
> bytes of actual data. This results in a higher memory footprint, which leads
> to higher memory requirements and worse performance. We don't get any benefit
> from the padding since the majority of tuples are not actually aligned in
> memory anyway.
> I did a quick version of the change for StringValue only which improves TPC-H
> performance.
> {noformat}
> Report Generated on 2018-07-30
> Run Description: "b5608264b4552e44eb73ded1e232a8775c3dba6b vs
> f1e401505ac20c0400eec819b9196f7f506fb927"
> Cluster Name: UNKNOWN
> Lab Run Info: UNKNOWN
> Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE ()
> Baseline Impala Version: impalad version 3.1.0-SNAPSHOT RELEASE (2018-07-27)
> +----------+-----------------------+---------+------------+------------+----------------+
> | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) |
> Delta(GeoMean) |
> +----------+-----------------------+---------+------------+------------+----------------+
> | TPCH(10) | parquet / none / none | 2.69 | -4.78% | 2.09 |
> -3.11% |
> +----------+-----------------------+---------+------------+------------+----------------+
> +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> | Workload | Query | File Format | Avg(s) | Base Avg(s) |
> Delta(Avg) | StdDev(%) | Base StdDev(%) | Num Clients | Iters |
> +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> | TPCH(10) | TPCH-Q22 | parquet / none / none | 0.94 | 0.93 |
> +0.75% | 3.37% | 2.84% | 1 | 30 |
> | TPCH(10) | TPCH-Q13 | parquet / none / none | 3.32 | 3.32 |
> +0.13% | 1.74% | 2.09% | 1 | 30 |
> | TPCH(10) | TPCH-Q11 | parquet / none / none | 0.99 | 0.99 |
> -0.02% | 3.74% | 3.16% | 1 | 30 |
> | TPCH(10) | TPCH-Q5 | parquet / none / none | 2.30 | 2.33 |
> -0.96% | 2.15% | 2.45% | 1 | 30 |
> | TPCH(10) | TPCH-Q2 | parquet / none / none | 1.55 | 1.57 |
> -1.45% | 1.65% | 1.49% | 1 | 30 |
> | TPCH(10) | TPCH-Q8 | parquet / none / none | 2.89 | 2.93 |
> -1.51% | 2.69% | 1.34% | 1 | 30 |
> | TPCH(10) | TPCH-Q9 | parquet / none / none | 5.96 | 6.06 |
> -1.63% | 1.34% | 1.82% | 1 | 30 |
> | TPCH(10) | TPCH-Q20 | parquet / none / none | 1.58 | 1.61 |
> -1.85% | 2.28% | 2.16% | 1 | 30 |
> | TPCH(10) | TPCH-Q16 | parquet / none / none | 1.18 | 1.21 |
> -2.11% | 3.68% | 4.72% | 1 | 30 |
> | TPCH(10) | TPCH-Q3 | parquet / none / none | 2.13 | 2.18 |
> -2.31% | 2.09% | 1.92% | 1 | 30 |
> | TPCH(10) | TPCH-Q15 | parquet / none / none | 1.86 | 1.90 |
> -2.52% | 2.06% | 2.22% | 1 | 30 |
> | TPCH(10) | TPCH-Q17 | parquet / none / none | 1.85 | 1.90 |
> -2.86% | 10.00% | 8.02% | 1 | 30 |
> | TPCH(10) | TPCH-Q10 | parquet / none / none | 2.58 | 2.66 |
> -2.93% | 1.68% | 6.49% | 1 | 30 |
> | TPCH(10) | TPCH-Q14 | parquet / none / none | 1.37 | 1.42 |
> -3.22% | 3.35% | 6.24% | 1 | 30 |
> | TPCH(10) | TPCH-Q18 | parquet / none / none | 4.99 | 5.17 |
> -3.38% | 1.75% | 3.82% | 1 | 30 |
> | TPCH(10) | TPCH-Q6 | parquet / none / none | 0.66 | 0.69 |
> -3.73% | 5.04% | 4.12% | 1 | 30 |
> | TPCH(10) | TPCH-Q4 | parquet / none / none | 1.07 | 1.12 |
> -3.97% | 1.79% | 2.85% | 1 | 30 |
> | TPCH(10) | TPCH-Q1 | parquet / none / none | 2.29 | 2.39 |
> -4.34% | 2.70% | 3.43% | 1 | 30 |
> | TPCH(10) | TPCH-Q7 | parquet / none / none | 6.17 | 6.45 |
> -4.42% | 2.15% | 1.69% | 1 | 30 |
> | TPCH(10) | TPCH-Q19 | parquet / none / none | 1.85 | 1.93 |
> -4.42% | 2.76% | 2.26% | 1 | 30 |
> | TPCH(10) | TPCH-Q12 | parquet / none / none | 1.32 | 1.42 |
> -6.75% | 2.55% | 3.56% | 1 | 30 |
> | TPCH(10) | TPCH-Q21 | parquet / none / none | 10.43 | 12.08 |
> -13.69% | * 14.21% * | * 10.76% * | 1 | 30 |
> +----------+----------+-----------------------+--------+-------------+------------+------------+----------------+-------------+-------+
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]