There is also this page, which has another paper published by the Impala team, as well as other related materials: https://cwiki.apache.org/confluence/display/IMPALA/Impala+Reading+List
On Wed, Apr 5, 2017 at 7:02 PM, Dimitris Tsirogiannis < [email protected]> wrote: > Hi Antoni, > > Regarding question 2. The catalog server collects file metadata, including > block locations from the HDFS NameNode and caches them in memory. Overtime, > file metadata are broadcast using the statestore to all the Impala servers > and stored in their local metadata caches. > > Dimitris > > On Tue, Apr 4, 2017 at 9:24 PM, Antoni Ivanov <[email protected]> wrote: > > > Hi, > > I've been reading on design of catalog service/statestore. > > Mostly from White paper about Impala - http://cidrdb.org/cidr2015/ > > Papers/CIDR15_Paper28.pdf > > I got it from Impala confluence wiki https://cwiki.apache.org/ > > confluence/display/IMPALA/Impala+Presentations%2C+Papers+and+Blog+Posts > > It’s rather interesting – it has fairly detailed (but clear) design of > > different components > > > > Are there other sources (except the source code)? > > > > Question 2: I’ve been wondering does Impalad caches files location itself > > – they don’t seem > > to be stored in hive metastatore. Just the partition location is there, > > right? > > > > >
