[email protected] From: Li Yang Date: 2015-05-06 04:23 To: dev Subject: Re: a few slides for Strata + Hadoop World London 2015 Hi Xu, Let me answers the grid table questions. The slides use inverted index as an example but didn't explain how cuboid is modeled by grid table. 1. Grid table assumes records are ordered, by some key. The key can be timestamp but does not have to be. When modeling a cuboid, all dimensions on cuboid becomes a composite order key. 2. Row block is the unit for row level filtering, with inverted index, brings random access to storage. However row block can also be disabled if a storage is designed for sequential scan. Cuboid is this case. 3. The size of row block is customizable, and it's like a slider let you adjust where to stand between random access and sequential read. The smaller the row block size, the better term filtering, the more random access. The larger the row block size, the less term filtering, the more sequential read. For storage like hbase which is better at sequential read, we tend to use big block size, and the resulted rowkey will be very similar to your suggestion ("coarse timestamp + term + fine timestamp"). A row block as big as a whole day data is a good idea. We can benchmark hbase to decide a more precise value. 4. About TopN, currently II and cube both can do TopN no problem. However there's a lot room for improvement. For II, row skipping is not implemented yet, depends on grid table ready. For cube, there's no plan to make TopN a metrics yet, even if we did that one day, the metrics will be huge like HyperLogLog, think about the ascending/descending, by what order, limit how much etc, the combinations are very big. The sort and limit push down is still an open JIRA. We don't have strong demand of super fast TopN at the moment. Speedup the cube build, enable streaming are highest priority. Cheers Yang On Mon, May 4, 2015 at 6:36 AM, Luke Han <[email protected]> wrote: > Great suggestion, thanks Ted. > > Regards! > Luke Han > > _____________________________ > From: Ted Dunning <[email protected]> > Sent: 星期一, 五月 4, 2015 01:07 > Subject: Re: a few slides for Strata + Hadoop World London 2015 > To: <[email protected]> > > > One important question that came up after I talked about Kylin at Hadoop > Summit was regarding security. Having a few slides on that may be good in > any presentation. > > > > On Sun, May 3, 2015 at 4:16 PM, Luke Han <[email protected]> wrote: > > > Thanks Yang, really great and detail about ongoing lambda architecture. > > > > Will merge to our deck, let's discuss about the detail on the plane > > tomorrow:-) > > > > Thanks. > > > > > > Best Regards! > > --------------------- > > > > Luke Han > > > > 2015-05-02 8:20 GMT+08:00 Li Yang <[email protected]>: > > > > > Hi Luke > > > > > > I created a few slides for Strata + Hadoop World London 2015 next week, > > > see attached. Let's see how they merge with previous deck. > > > > > > Some should attach to related JIRA as design doc. I'll do it later. > > > > > > Cheers > > > Yang > > > > > >
