Hello Long Zhou, Thanks for reaching out. I'm developer at Lens and trying to answer your questions with respect to Lens.
On Thu, Feb 26, 2015 at 9:09 PM, Long Zhou <[email protected]> wrote: > [delivery to user@kylin failed, resend to dev@kylin] > > Hi Kylin and Lens communities, > > I am working on a big data analysis project and consider using Kylin > or Lens. Do you have some guidelines/recommendations on how to choose the > right solution? We are particularly interested in the performance > characteristics of these two solutions on terabytes of sparse data. > We don't have guidelines/recommendations/performance characteristics documented anywhere as of now. But user documentation should help you with some details of the system. Lens itself does not have any overhead with respect to query execution, it would be given to underlying engine and the performance numbers published in underlying systems should be sufficient. > I just started learning the two projects. It seems Kylin is more like > MOLAP while Lens is more like ROLAP, is that correct? Does the differences > between MOLAP and ROLAP apply here? > I agree with Lens that it is ROLAP like system. We can say Lens can become HOLAP (http://en.wikipedia.org/wiki/ROLAP, http://en.wikipedia.org/wiki/HOLAP, http://www.1keydata.com/datawarehousing/molap-rolap.html). And as said in ROLAP, performance of Lens depends on underlying execution engines and if the data is not aggregated, it would pick detailed tables for answering. But if aggregated data is available through an ETL process, it would make use of it. When using Hive as storage, it seems Kylin might perform better since > data is pre-aggregated and cached. How does Kylin handle sparse tables and > avoid empty cells in cache? Does Lens have cache on top of Hive? > No, Lens does not have any cache on top of Hive. > Lens supports columnar data warehouses like Redshift. How much > performance could we gain by loading data to Redshift? Where can I find > performance benchmark data for the two projects? > It would be same as how fast Redshift can answer queries. Lens comes with JDBCDriver for reaching systems which can understand jdbc. At inmobi, we are using it with Columnar dataware house - InfoBright ( https://www.infobright.com/) in production, it should work with Redshift as well, but it is not yet tested with RedShift. Thanks Amareshwari
