[jira] [Commented] (IMPALA-8121) Pick better default flags in containers

Michael Ho (JIRA) Fri, 10 May 2019 09:38:46 -0700


    [ 
https://issues.apache.org/jira/browse/IMPALA-8121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16837430#comment-16837430
 ]


Michael Ho commented on IMPALA-8121:
------------------------------------

{quote} Michael Ho can you suggest what a good default config for the I/O cache 
would be? {quote}

Sorry for the late reply. When I tested with the data cache enabled in a 
mini-cluster with 3 node using the default scale of workload, I ran with 500 MB 
with 1 partition by running {noformat} start-impala-cluster.py 
--data_cache_dir=/tmp --data_cache_size=500MB{noformat} You can also a 
pre-existing directory as the startup flag of Impala like 
{noformat}--data_cache=/tmp/data-cache-0:500MB{noformat} 

> Pick better default flags in containers
> ---------------------------------------
>
>                 Key: IMPALA-8121
>                 URL: https://issues.apache.org/jira/browse/IMPALA-8121
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Infrastructure
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>            Priority: Major
>              Labels: docker
>
> There are some new features of Impala that are done but disabled by default 
> because they are not strictly better than the previous versions. E.g. the 
> various metadata improvements. Containerised Impala is likely to be new 
> deployments, so it is easier to make potentially disruptive changes to 
> defaults now.
> h2. Metadata V2 Flags
> Catalogd:
> --catalog_topic_mode=minimal
> Impalad:
> --use_local_catalog=true
> We want to invalidate based on HMS notifications 
> (https://issues.apache.org/jira/browse/IMPALA-7970) and memory pressure. It's 
> less clear if invalidating tables based on time is really useful - for large 
> fact tables it would add a lot of unpredictability because reloading the 
> tables is expensive.
> Catalogd:
> --invalidate_tables_timeout_s=???
> --invalidate_tables_on_memory_pressure=true
> Once IMPALA-7970 goes in, we probably also want automatic invalidation by 
> default (TBD - how to handle older HMS that doesn't support those APIs).
> Catalogd:
> --hms_event_polling_interval_s=???
> We probably want to enable HDFS preads for remote reads: -use_hdfs_pread - 
> but I think this is going to be done automatically.
> We may want to have an I/O cache enabled - tracked by IMPALA-8121



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (IMPALA-8121) Pick better default flags in containers

Reply via email to