Since we're working on designing new cluster management for manage LB servers and streaming job slaves. I think it's a good opportunity for kylin user to share their pain points and wish list help to improve kylin use experience.
Here're mine: 1. Cluster configuration is troublesome. Currently we have to write down the server list in kylin.properties and assign a role to each server. This is hard to maintain. The new cluster management should automate server discovery, leader selection and failover. 2. Log analyze is not easy if multiple servers are running at the same time. (https://issues.apache.org/jira/browse/KYLIN-1124 for example). For query side, we should be able to answer questions like "I submitted a query XXXXX at 10:00, please check why it's slow?", "what are the most time consuming queries recently (and its related cube name)?". For streaming job dispatcher side, we should be able to identify failed batches more quickly(and resume it), as well as a better management of each batch's build log (when you have tens of slaves, it's difficult to find where is a batch's build log is). A related JIRA ticket is https://issues.apache.org/jira/browse/KYLIN-1079 3. Streaming batch jobs should be horizontally scalable. If a batch is found to be too big to fit into a single JVM, we should detect it and divide the batch into smaller pieces so that we can dispatch the job to multiple JVMs, and let subsequent auto-merge job to merge them. Related JIRA is https://issues.apache.org/jira/browse/KYLIN-1042 4. Auto-merge job fail will lead to accumulating hundreds of segments, this will greatly harm query performance. related JIRA: https://issues.apache.org/jira/browse/KYLIN-1038 -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
