jon-wei commented on issue #8276: KIS tasks in 0.15.1 RC2 sometimes duplicate rows with the same dimension values URL: https://github.com/apache/incubator-druid/issues/8276#issuecomment-521474290 In the task logs, when you see messages like the following: ``` 2019-08-11T03:54:10,121 INFO [task-runner-0-priority-0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Submitting persist runnable for dataSource[kube-metrics-30m] 2019-08-11T03:54:10,121 INFO [kube-metrics-30m-incremental-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Segment[kube-metrics-30m_2019-08-11T00:00:00.000Z_2019-08-11T06:00:00.000Z_2019-08-11T00:00:00.177Z_1], persisting Hydrant[FireHydrant{, queryable=kube-metrics-30m_2019-08-11T00:00:00.000Z_2019-08-11T06:00:00.000Z_2019-08-11T00:00:00.177Z_1, count=6}] 2019-08-11T03:54:10,122 INFO [kube-metrics-30m-incremental-persist] org.apache.druid.segment.IndexMergerV9 - Starting persist for interval[2019-08-11T00:00:00.000Z/2019-08-11T06:00:00.000Z], rows[59] ``` that indicates the task is persisting in-memory segment data to disk (a `hydrant` is an intermediate persist, on disk it has the same format as any other druid segment). The intermediate persists are queryable, and when the task decides to push to deep storage/publish, it will merge the intermediate persists into larger segments and push the merged segments, e.g.: ``` 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Pushed merged index for segment[kube-metrics-30m_2019-08-11T12:00:00.000Z_2019-08-11T18:00:00.000Z_2019-08-11T12:00:00.122Z], descriptor is: DataSegment{size=24988, shardSpec=NumberedShardSpec{partitionNum=0, partitions=0}, metrics=[count, req_cpu, req_mem, lim_cpu, lim_mem, cap_cpu, cap_mem, cap_pods, alloc_cpu, alloc_mem, alloc_pods, cpu, rxBytes, txBytes, mem_used, mem_limit], dimensions=[accountid, cluster, name, namespace, etype, labels, owner, owner_kind, node, image, pod, node_pool], version='2019-08-11T12:00:00.122Z', loadSpec={type=>s3_zip, bucket=>dsprod, key=>kube-metrics-30m/2019-08-11T12:00:00.000Z_2019-08-11T18:00:00.000Z/2019-08-11T12:00:00.122Z/0/4f1a6e43-6eaf-49a8-8d12-95253b4cb203/index.zip, S3Schema=>s3n}, interval=2019-08-11T12:00:00.000Z/2019-08-11T18:00:00.000Z, dataSource='kube-metrics-30m', binaryVersion='9'} 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Pushing merged index for segment[kube-metrics-30m_2019-08-11T06:00:00.000Z_2019-08-11T12:00:00.000Z_2019-08-11T06:00:00.111Z]. 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Adding hydrant[FireHydrant{, queryable=kube-metrics-30m_2019-08-11T06:00:00.000Z_2019-08-11T12:00:00.000Z_2019-08-11T06:00:00.111Z, count=0}] 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Adding hydrant[FireHydrant{, queryable=kube-metrics-30m_2019-08-11T06:00:00.000Z_2019-08-11T12:00:00.000Z_2019-08-11T06:00:00.111Z, count=1}] 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Adding hydrant[FireHydrant{, queryable=kube-metrics-30m_2019-08-11T06:00:00.000Z_2019-08-11T12:00:00.000Z_2019-08-11T06:00:00.111Z, count=2}] 2019-08-11T14:43:59,896 INFO [appenderator_merge_0] ... ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
