Realtime support is on Kylin's roadmap. We can collaborate on this if you are interested.
The idea is simple. Say current Kylin can do 5 minutes micro batch, then only need a realtime storage to catch up-to 5 minutes latest data (which comes after the last batch). Query will hit both realtime storage and cube storage. In your attempt, the realtime storage is still HBase, but I prefer it be a new separate table. The realtime storage need to expose interface for query, which is straightforward given we've done it once. On Thu, Mar 31, 2016 at 11:54 AM, zeLiu <[email protected]> wrote: > thanks hongbin, > > It is true that, as you say, the datas must be pre aggregate , or the same > key will cover each other. > > I just refer to the MapReduce code of kylin, add data to the HBase in real > time, and not a very good idea > > The reason for this is that we are all doing two products, a real-time and > an offline. > Their front UI are the same, in the past we are to write real-time data > into > the mysql,storage offline data use kylin, and this will need to develop new > interfaces for mysql. > But if both real-time and off-line are written to the kylin, we can only > develop an interface for UI. > > > I did a test, the same data, the delay is much smaller than the mysql, The > average delay is about 6 ms. > I didn't use the dictionary, because I'm worried that if a new value is not > found in the dictionary, it will affect the accuracy of the data. > > Whether there is a better solution? > > the plug-in code: https://github.com/zeliu/kylin-storm-plugin > > thanks > > -- > View this message in context: > http://apache-kylin.74782.x6.nabble.com/update-hbase-data-realtime-and-query-it-tp3959p4019.html > Sent from the Apache Kylin mailing list archive at Nabble.com. >
