Hi!

I'm a grad student at Georgia Tech and I'm currently working with Hive for a university project. The project is on query optimization techniques and possibilities in Hive. I know that there have been a lot of additions to the ql and metastore components since the latest release and I was hoping to help advancing those components even further. My main interests in the course of my research is the storage and use of metadata to run a cost-based optimizer. This involves basic optimizations using for example the table size for cost estimations, but also more advanced approaches using histograms. I know that table and partition information is already collected in Hive, but from what I could gather, column metadata and histograms are still open. Would it be possible for me to contribute to the project in that area?

Anja

Reply via email to