Joe McDonnell created IMPALA-12139:
--------------------------------------
Summary: Add end-to-end tests for HDFS caching with Parquet page
indexes, etc.
Key: IMPALA-12139
URL: https://issues.apache.org/jira/browse/IMPALA-12139
Project: IMPALA
Issue Type: Improvement
Components: Backend, Infrastructure
Affects Versions: Impala 4.3.0
Reporter: Joe McDonnell
In a recent bug, we found issues with how HDFS caching interacts with Parquet
page indexes (IMPALA-12123). This was diagnosed by creating a table with a
Parquet file with page indexes and enabling HDFS caching. This is a very useful
test scenario, and this would also be true for all other file formats and the
scanner fuzzing tests.
e limiting factor is that HDFS caching requires the ability to lock memory, and
the amount of locked memory is limited on Linux for security reasons. By
default, the limit is 64KB
{noformat}
# -l the maximum size a process may lock into memory
$ ulimit -l
65536{noformat}
HDFS configuration specifies the max size of locked memory in
hdfs-site.xml.tmpl:
{noformat}
<!-- Set the max cached memory to ~64kb. This must be less than ulimit -l -->
<property>
<name>dfs.datanode.max.locked.memory</name>
<value>64000</value>
</property>{noformat}
A 64KB limit means that HDFS caching is unreliable and/or impossible with
normal sized Parquet files. We can do the "alter table foo set cached in
'testPool'" but it may not actually get cached.
To set these to a higher value, we can set the following in
/etc/security/limits.conf and get a new user session:
{noformat}
* hard memlock unlimited
* soft memlock unlimited{noformat}
Then, we can bump the dfs.datanode.max.locked.memory to a much larger size.
With this larger size, the caching operations are more reliable and we could
create end-to-end tests.
To get true HDFS caching end-to-end tests, we will need to configure this,
possibly in bin/bootstrap_system.sh, possibly as preexisting configuration for
Jenkins workers.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]