Hi Tianyi - The idea behind the metadata cache is to try to ensure that no RPCs to the metastore are on the critical path of query planning. However, the way we populate and invalidate the cache does cause us some problems; the statestore was never designed to carry this kind of payload and this can cause difficulties in large clusters.
We are aware of the problems, and hope to do something about them, but don't have concrete plans just yet. If you have thoughts, please bring them up on this dev@ mailing list and we can discuss them! Thanks, Henry On 12 July 2016 at 13:34, Huaisi Xu <[email protected]> wrote: > Hi Tianyi, thanks for contacting us! > > > > Could you elaborate the biggest problems you are facing with this design? > > > > As we are moving to ASF, you can ask questions here > [email protected]. > > > > I think for questions regarding design decisions and future improvement, > +Dimitris and +Henry knows better. > > > > > > Huaisi > > > > *From: *何天一 <[email protected]> > *Date: *Monday, July 11, 2016 at 10:42 PM > *To: *Huaisi Xu <[email protected]> > *Subject: *Looking for OLAP suggestion > > > > Hi, Huaisi. > > > > We communicated before in cloudera JIRA (IMPALA-3499 > <https://issues.cloudera.org/browse/IMPALA-3499>). I am currently working > on distributed storage and computing, including OLAP engines, for 今日头条. > > > > I am looking for technical suggestions and hope you could help. > > > > I see that Impala Catalogd caches metadata from Hive Metastore and HDFS > (or other storage). > > IMHO This can be considered as a good optimization for performance. > > However, in our production environment, this mechanism tend to cause > problem. > > Could you help to explain the design choice behind this? Why did Impala > cache meta in the first place? And, is there any optimization in progress > to make the mechanism better? > > > > Thanks. > > > > > -- > > Cheers, > > Tianyi HE > > (+86) 185 0042 4096 > -- Henry Robinson Software Engineer Cloudera 415-994-6679
