luchunliang commented on a change in pull request #2267:
URL: https://github.com/apache/incubator-inlong/pull/2267#discussion_r790131573
##########
File path:
inlong-sort/sort-core/src/main/java/org/apache/inlong/sort/flink/deserialization/DeserializationSchema.java
##########
@@ -122,7 +145,23 @@ public void processElement(
context.output(METRIC_DATA_OUTPUT_TAG, metricData);
}
-
collector.collect(recordTransformer.toSerializedRecord(sinkRecord));
+ SerializedRecord serializedSinkRecord =
recordTransformer.toSerializedRecord(sinkRecord);
+
+ if (auditImp != null) {
+ Pair<String, String> groupIdAndStreamId =
inLongGroupIdAndStreamIdMap.getOrDefault(
+ serializedRecord.getDataFlowId(),
+ Pair.of("", ""));
+
+ auditImp.add(
+ Constants.METRIC_AUDIT_ID_FOR_INPUT,
+ groupIdAndStreamId.getLeft(),
+ groupIdAndStreamId.getRight(),
+ sinkRecord.getTimestampMillis(),
Review comment:
Please check that sinkRecord.getTimestampMillis() is the generated time
of pulsar/tube Message or the logged time of user data or current time.
public DeserializationResult<SerializedRecord>
deserialize(@SuppressWarnings("rawtypes") Message message)
throws IOException {
final byte[] data = message.getData();
return DeserializationResult.of(new SerializedRecord(dataFlowId,
message.getEventTime(), data), data.length);
}
deserializer.flatMap(mixedRow, new CallbackCollector<>((row -> {
// each tid might be associated with multiple data flows
for (long dataFlowId : dataFlowIds) {
collector.collect(new Record(dataFlowId,
System.currentTimeMillis(), row));
}
})));
##########
File path:
inlong-sort/sort-core/src/main/java/org/apache/inlong/sort/flink/deserialization/DeserializationSchema.java
##########
@@ -181,6 +228,8 @@ public void removeDataFlow(DataFlowInfo dataFlowInfo)
throws Exception {
multiTenancyDeserializer.removeDataFlow(dataFlowInfo);
fieldMappingTransformer.removeDataFlow(dataFlowInfo);
recordTransformer.removeDataFlow(dataFlowInfo);
+
+ inLongGroupIdAndStreamIdMap.remove(dataFlowInfo.getId());
Review comment:
Remove the map between dataFlowId and inlongGroupIdStreamId immediately
when the configuration data remove DataFlowInfo, maybe miss some audit data.
##########
File path:
inlong-sort/sort-core/src/main/java/org/apache/inlong/sort/flink/hive/HiveMultiTenantWriter.java
##########
@@ -126,6 +147,20 @@ public void processElement(SerializedRecord
serializedRecord, Context context,
hiveWriter.processElement(recordTransformer.toRecord(serializedRecord).getRow(),
proxyContext.setContext(context), collector);
+
+ if (auditImp != null) {
+ Pair<String, String> groupIdAndStreamId =
inLongGroupIdAndStreamIdMap.getOrDefault(
+ serializedRecord.getDataFlowId(),
+ Pair.of("", ""));
+
+ auditImp.add(
+ Constants.METRIC_AUDIT_ID_FOR_OUTPUT,
+ groupIdAndStreamId.getLeft(),
+ groupIdAndStreamId.getRight(),
+ serializedRecord.getTimestampMillis(),
Review comment:
ditto
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]