[
https://issues.apache.org/jira/browse/KYLIN-4612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xiaoxiang Yu resolved KYLIN-4612.
---------------------------------
Resolution: Fixed
> Support job status write to kafka
> ---------------------------------
>
> Key: KYLIN-4612
> URL: https://issues.apache.org/jira/browse/KYLIN-4612
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine
> Reporter: chuxiao
> Assignee: chuxiao
> Priority: Minor
> Fix For: v3.1.1
>
>
> because more than hundrad job running , so job status changed write to kafka
> instread of query job list.
> English:
> 1. The job status supports sending emails, but does not support other
> notification systems. Sending emails is notifying people, and also needs to
> support notifying other systems. For al scheduling system, before is to
> prepare hive table data. Then build the kylin job and wait. After the job is
> completed, it may be to notify the reporting system that the report query for
> the segment date can be released, or may directly query the cube data to send
> the daily report, or notify other systems in need.
> If there are hundreds or thousands of build tasks running at the same time,
> waiting for build notifications consumes less cost than the client's 10s
> polling.
> Producers and consumers who use kafka to build notifications can decouple
> kylin from third-party systems, and third-party system problems will not
> affect kyiln.
> Compared with the real-time system cube, the notification is built to notify
> kylin users, not to kylin administrators. The user does not care about the
> details of the system cube, only the job result. In addition, due to
> different consumers, the topic used to build status notifications and system
> cubes are different, and even a Kafka cluster will not use a set. Therefore,
> the Kafka write logic of the real-time system cube cannot be shared.
> Since the third-party system may have a requirement to build a progress bar,
> the status change of each subtask is also sent to kafka.
> 2. Configuration. kylin.propertis adds
> kylin.engine.job-status.write.kafka=true to enable this function.
> Configure kylin.engine.job-status.kafka.bootstrap.servers=xxx and specify the
> connection service address.
> kylin.engine.job-status.kafka.topic.name=xx, specify the topic name to send.
> Others require dekafka configuration, can be configured through
> kylin.engine.job-status.kafka.{name}={value}, Kafka Properties configuration
> will increase all the configuration of {name}={value}. default
> ```
> ACKS_CONFIG, "-1",
> COMPRESSION_TYPE_CONFIG=lz4,
> RETRIES_CONFIG=3,
> LINGER_MS_CONFIG=500,
> BATCH_SIZE_CONFIG, 10000,
> ```
> For parameters supported by other producers, see
> http://kafka.apache.org/documentation/#producerconfigs
> 3. How to receive. Parse the status from json.
> The structure of the message body is as follows:
> {"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f","jobName":"BUILD
> CUBE-sys_probe-20150701000000_20190701000000-GMT+08:00 2019-07-05
> 11:42:33","status": "DISCARDED","subTaskSize":
> "11","subTasks":[{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-00","jobName":"Create
> Intermediate Flat Hive Table","status
> ":"FINISHED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-01","jobName":"Extract
> Fact Table Distinct Columns","status":"DISCARDED"},{"jobId
> ":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-02","jobName":"Build Dimension
> Dictionary","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f
> -03","jobName":"Save Cuboid
> Statistics","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-04","jobName":"Create
> HTable"
> ,"status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-05","jobName":"Build
> Cube with Spark","status":"DISCARDED"},{
> "jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-06","jobName":"Convert Cuboid
> Data to HFile","status":"DISCARDED"},{"jobId":"63d52094-
> ca46-c4fa-7e77-242a6cf74f0f-07","jobName":"Load HFile to HBase
> Table","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-08"
> ,"jobName":"Update Cube
> Info","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-09","jobName":"Hive
> Cleanup","status
> ":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-10","jobName":"Garbage
> Collection on HDFS","status":"DISCARDED"}]}
> -----------------------------------------------------
> 中文:
> 1.
> 构建执行状态支持发邮件,不支持其他主动通知方式。发邮件是通知人,也需要支持通知其他系统。对于外围调度系统来说,前序是准备hive表数据。再构建kylin作业,等待构建完成。构建完成后,后续是通知报表系统可以放开相应日期的报表查询了,或者直接查询构建数据发送日报,或者通知其他有需要的系统。
> 如果有几百上千个构建任务再同时运行,等待构建通知比客户端10s定时轮训消耗低。
> 用kafka做构建通知的生产者消费者,可以将kylin和第三方系统解耦,第三方系统异常不会影响kyiln。
> 跟实时系统cube相比,构建通知是通知kylin用户的,不是通知给kylin管理员的。用户不关心系统cube的细节,只关心构建结果。另外由于消费者不同,构建状态通知和系统cube用的topic不一样,甚至kafka集群都不会用一套。所以不能共用实时系统cube的kafka写入逻辑。
> 由于第三方系统可能有构建进度条需求,所以每个子task的状态变更也发送给kafka了。
> 2. 配置。kylin.propertis 增加kylin.engine.job-status.write.kafka=true,启用该功能。
> 配置kylin.engine.job-status.kafka.bootstrap.servers=xxx,指定连接服务地址。
> kylin.engine.job-status.kafka.topic.name=xx,指定发送topic名。
> 其他需要dekafka配置,可以通过kylin.engine.job-status.kafka.{name}={value}配置,kafka的Properties配置会增加所有的{name}={value}的配置。默认
> ```
> ACKS_CONFIG, "-1",
> COMPRESSION_TYPE_CONFIG=lz4,
> RETRIES_CONFIG=3,
> LINGER_MS_CONFIG=500,
> BATCH_SIZE_CONFIG, 10000,
> ```
> 其他生产者支持的参数见 http://kafka.apache.org/documentation/#producerconfigs
> 3. 如何接收。解析消息体的状态字段。
> 消息体结构如下:
> {"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f","jobName":"BUILD CUBE -
> sys_probe - 20150701000000_20190701000000 - GMT+08:00 2019-07-05
> 11:42:33","status":"DISCARDED","subTaskSize":
> "11","subTasks":[{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-00","jobName":"Create
> Intermediate Flat Hive
> Table","status":"FINISHED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-01","jobName":"Extract
> Fact Table Distinct
> Columns","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-02","jobName":"Build
> Dimension
> Dictionary","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-03","jobName":"Save
> Cuboid
> Statistics","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-04","jobName":"Create
>
> HTable","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-05","jobName":"Build
> Cube with
> Spark","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-06","jobName":"Convert
> Cuboid Data to
> HFile","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-07","jobName":"Load
> HFile to HBase
> Table","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-08","jobName":"Update
> Cube
> Info","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-09","jobName":"Hive
>
> Cleanup","status":"DISCARDED"},{"jobId":"63d52094-ca46-c4fa-7e77-242a6cf74f0f-10","jobName":"Garbage
> Collection on HDFS","status":"DISCARDED"}]}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)