[
https://issues.apache.org/jira/browse/IMPALA-8341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16833218#comment-16833218
]
ASF subversion and git services commented on IMPALA-8341:
---------------------------------------------------------
Commit 25db9ea8f3f4de3086610ccb6040a101691e6340 in impala's branch
refs/heads/master from Michael Ho
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=25db9ea ]
IMPALA-8496: Fix flakiness of test_data_cache.py
test_data_cache.py was added as part of IMPALA-8341 to verify
that the DataCache hit / miss counts and the DataCache metrics
are working as expected. The test seems to fail intermittently
due to unexpected cache misses.
Part of the test creates a temp table from tpch_parquet.lineitem
and uses it to join against tpch_parquet.lineitem itself on the
l_orderkey column. The test expects a complete cache hit for
tpch_parquet.lineitem when joining against the temp table as it
should be cached entirely as part of CTAS statement. However, this
doesn't work as expected all the time. In particular, the data cache
internally divides up the key space into multiple shards and a key
is hashed to determine the shard it belongs to. By default, the
number of shards is the same as number of CPU cores (e.g. 16 for AWS
m5-4xlarge instance). Since the cache size is set to 500MB, each shard
will have a capacity of 31MB only. In some cases, it's possible that
some rows of l_orderkey are evicted if the shard they belong to grow
beyond 31MB. The problem is not deterministic as part of the cache key
is the modification time of the file, which changes from run-to-run as
it's essentially determined by the data loading time of the job. This
leads to flakiness of the test.
To fix this problem, this patch forces the data cache to use a single
shard only for determinisim. In addition, the test is also skipped for
non-HDFS and HDFS erasure encoding builds as it's dependent on the scan
range assignment. To exercise the cache more extensively, the plan is
to enable it by default for S3 builds instead of relying on BE and E2E
tests only.
Testing done:
- Ran test_data_cache.py 10+ times, each with different mtime
for tpch_parquet.lineitem; Used to fail 2 out of 3 runs.
Change-Id: I98d5b8fa1d3fb25682a64bffaf56d751a140e4c9
Reviewed-on: http://gerrit.cloudera.org:8080/13242
Reviewed-by: Impala Public Jenkins <[email protected]>
Tested-by: Impala Public Jenkins <[email protected]>
> Data cache for remote reads
> ---------------------------
>
> Key: IMPALA-8341
> URL: https://issues.apache.org/jira/browse/IMPALA-8341
> Project: IMPALA
> Issue Type: Improvement
> Components: Backend
> Affects Versions: Impala 3.2.0
> Reporter: Michael Ho
> Assignee: Michael Ho
> Priority: Critical
> Fix For: Impala 3.3.0
>
>
> When running in public cloud (e.g. AWS with S3) or in certain private cloud
> settings (e.g. data stored in object store), the computation and storage are
> no longer co-located. This breaks the typical pattern in which Impala query
> fragment instances are scheduled at where the data is located. In this
> setting, the network bandwidth requirement of both the nics and the top of
> rack switches will go up quite a lot as the network traffic includes the data
> fetch in addition to the shuffling exchange traffic of intermediate results.
> To mitigate the pressure on the network, one can build a storage backed cache
> at the compute nodes to cache the working set. With deterministic scan range
> scheduling, each compute node should hold non-overlapping partitions of the
> data set.
> An initial prototype of the cache was posted here:
> [https://gerrit.cloudera.org/#/c/12683/] but it probably can benefit from a
> better eviction algorithm (e.g. LRU instead of FIFO) and better locking (e.g.
> not holding the lock while doing IO).
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]