[
https://issues.apache.org/jira/browse/IMPALA-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837430#comment-16837430
]
Michael Ho commented on IMPALA-8121:
------------------------------------
{quote} Michael Ho can you suggest what a good default config for the I/O cache
would be? {quote}
Sorry for the late reply. When I tested with the data cache enabled in a
mini-cluster with 3 node using the default scale of workload, I ran with 500 MB
with 1 partition by running {noformat} start-impala-cluster.py
--data_cache_dir=/tmp --data_cache_size=500MB{noformat} You can also a
pre-existing directory as the startup flag of Impala like
{noformat}--data_cache=/tmp/data-cache-0:500MB{noformat}
> Pick better default flags in containers
> ---------------------------------------
>
> Key: IMPALA-8121
> URL: https://issues.apache.org/jira/browse/IMPALA-8121
> Project: IMPALA
> Issue Type: Sub-task
> Components: Infrastructure
> Reporter: Tim Armstrong
> Assignee: Tim Armstrong
> Priority: Major
> Labels: docker
>
> There are some new features of Impala that are done but disabled by default
> because they are not strictly better than the previous versions. E.g. the
> various metadata improvements. Containerised Impala is likely to be new
> deployments, so it is easier to make potentially disruptive changes to
> defaults now.
> h2. Metadata V2 Flags
> Catalogd:
> --catalog_topic_mode=minimal
> Impalad:
> --use_local_catalog=true
> We want to invalidate based on HMS notifications
> (https://issues.apache.org/jira/browse/IMPALA-7970) and memory pressure. It's
> less clear if invalidating tables based on time is really useful - for large
> fact tables it would add a lot of unpredictability because reloading the
> tables is expensive.
> Catalogd:
> --invalidate_tables_timeout_s=???
> --invalidate_tables_on_memory_pressure=true
> Once IMPALA-7970 goes in, we probably also want automatic invalidation by
> default (TBD - how to handle older HMS that doesn't support those APIs).
> Catalogd:
> --hms_event_polling_interval_s=???
> We probably want to enable HDFS preads for remote reads: -use_hdfs_pread -
> but I think this is going to be done automatically.
> We may want to have an I/O cache enabled - tracked by IMPALA-8121
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]