Saulius Valatka created IMPALA-12350:
----------------------------------------

             Summary: Daemon fails to initialize large catalog
                 Key: IMPALA-12350
                 URL: https://issues.apache.org/jira/browse/IMPALA-12350
             Project: IMPALA
          Issue Type: Bug
    Affects Versions: Impala 4.2.0
            Reporter: Saulius Valatka


When the statestored catalog topic is large enough (>2gb) daemons fail to 
restart and get stuck in a loop:

{{I0808 13:07:17.702653 3633556 Frontend.java:1618] Waiting for local catalog 
to be initialized, attempt: 2068}}

 

The statestored reports errors as follows:

{{I0808 13:07:05.587296 2134270 thrift-util.cc:196] TSocket::write_partial() 
send() <Host: gs1-hdp-data70 Port: 23000>: Broken pipe}}
{{I0808 13:07:05.587356 2134270 client-cache.h:362] RPC Error: Client for 
gs1-hdp-data70:23000 hit an unexpected exception: write() send(): Broken pipe, 
type: N6apache6thrift9transport19TTransportExceptionE, rpc: 
N6impala20TUpdateStateResponseE, send: not done}}
{{I0808 13:07:05.587365 2134270 client-cache.cc:174] Broken Connection, destroy 
client for gs1-hdp-data70:23000}}

 

If this happens we are forced to restart statestore and thus the whole cluster, 
meaning that we can't tolerate failure from even a single daemon.

Interestingly the catalog topic increased significantly after upgrading from 
3.4.0 to 4.2.0 - from ~800mb to ~3.4gb. Invalidate/refresh operations also 
became significantly slower (~10ms -> 5s).

Probably related to thrift_rpc_max_message_size? but I see the maximum value is 
2gb.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to