Awesome!! On Wed, May 27, 2015 at 10:55 AM, Ashutosh Chauhan <hashut...@apache.org> wrote:
> Siva / Scott, > > Such a framework exists in some form : > https://issues.apache.org/jira/browse/HIVE-2038 > To make it even more generic there was a proposal > https://issues.apache.org/jira/browse/HIVE-2147 But there was a resistance > from a community for it. May be now community is ready for it : ) > > Ashutosh > > On Tue, May 26, 2015 at 10:12 PM, Sivaramakrishnan Narayanan < > tarb...@gmail.com> wrote: > > > Thanks for the replies. > > > > @Ashutosh - thanks for the pointer! Yes I was running 0.11 metastore. Let > > me try with 0.13 metastore! Maybe my woes will be gone. If they don't > then > > I'll continue working along these lines. > > > > @Alan - agreed. Caching MTables seems like a better approach if 0.13 > > metastore perf is not as good as I'd like. > > > > @Scott - a pluggable hook for metastore calls would be super useful. If > you > > want to generate events for client-side actions, I suppose you could just > > implement a dynamic proxy class over the metastore client class which > does > > whatever you need it to. Similar technique could work in the server side > - > > I believe there is already a RetryingMetaStoreClient proxy class in > place. > > > > > > On Wed, May 27, 2015 at 7:32 AM, Ashutosh Chauhan <hashut...@apache.org> > > wrote: > > > > > Are you running pre-0.12 or with hive.metastore.try.direct.sql = false; > > > > > > Work done on https://issues.apache.org/jira/browse/HIVE-4051 should > > > alleviate some of your problems. > > > > > > > > > On Mon, May 25, 2015 at 8:19 PM, Sivaramakrishnan Narayanan < > > > tarb...@gmail.com> wrote: > > > > > > > Apologies if this has been discussed in the past - my searches did > not > > > pull > > > > up any relevant threads. If there are better solutions available out > of > > > the > > > > box, please let me know! > > > > > > > > Problem statement > > > > -------------------------- > > > > > > > > We have a setup where a single metastoredb is used by Hive, Presto > and > > > > SparkSQL. In addition, there are 1000s of hive queries submitted in > > batch > > > > form from multiple machines. Oftentimes, the metastoredb ends up > being > > > > remote (in a different region in AWS etc) and round-trip latency is > > high. > > > > We've seen single thrift calls getting translated into lots of small > > SQL > > > > calls by datanucleus and the roundtrip latency ends up killing > > > performance. > > > > Furthermore, any of these systems may create / modify a hive table > and > > > this > > > > should be reflected in the other system. Example, I may create a > table > > in > > > > hive and query it using Presto or vice versa. In our setup, there may > > be > > > > multiple thrift metastore servers pointing to the same metastore db. > > > > > > > > Investigation > > > > ------------------- > > > > > > > > Basically, we've been looking at caching to solve this problem (will > > come > > > > to invalidation in a bit). I looked briefly at DN's support for > > caching - > > > > these two parameters seem to be switched off by default. > > > > > > > > METASTORE_CACHE_LEVEL2("datanucleus.cache.level2", false), > > > > METASTORE_CACHE_LEVEL2_TYPE("datanucleus.cache.level2.type", > > "none"), > > > > > > > > Furthermore, my reading of > > > > http://www.datanucleus.org/products/datanucleus/jdo/cache.html > > suggests > > > > that there is no sophistication in invalidation - seems like only > > > > time-based invalidation is supported and it can't work across > multiple > > > PMFs > > > > (therefore, multiple thrift metastore servers) > > > > > > > > Solution Outline > > > > ----------------------- > > > > > > > > - Every table / partition will have an additional property called > > > > 'version' > > > > - Any call that modifies table or partition will bump up version > of > > > the > > > > table / partition > > > > - Guava based cache of thrift objects that come from metastore > calls > > > > - We fire a single SQL matching versions before returning from > cache > > > > - It is conceivable to have a mode wherein invalidation based on > > > version > > > > happens in a background thread (for higher performance, lower > > > fidelity) > > > > - Not proposing any locking (not shooting for world peace here :) > ) > > > > - We could extend HiveMetaStore class or create a new server > > > altogether > > > > > > > > Is this something that would be interesting to the community? Is this > > > > problem already solved and should I spend my time watching GoT > instead? > > > > > > > > Thanks > > > > Siva > > > > > > > > > >