[
https://issues.apache.org/jira/browse/GOBBLIN-1869?focusedWorklogId=875479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-875479
]
ASF GitHub Bot logged work on GOBBLIN-1869:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 09/Aug/23 22:31
Start Date: 09/Aug/23 22:31
Worklog Time Spent: 10m
Work Description: ZihanLi58 commented on code in PR #3732:
URL: https://github.com/apache/gobblin/pull/3732#discussion_r1289293613
##########
gobblin-modules/gobblin-orc/src/main/java/org/apache/gobblin/writer/InstrumentedGobblinOrcWriter.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.writer;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+
+import com.google.common.collect.Maps;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.State;
+import org.apache.gobblin.instrumented.Instrumented;
+import org.apache.gobblin.metrics.GobblinTrackingEvent;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.GobblinEventBuilder;
+
+
+/***
+ * A class for an event emitting GobblinOrcWriter metrics, such as internal
memory resizing and flushing
+ */
+@Slf4j
+public class InstrumentedGobblinOrcWriter extends GobblinOrcWriter {
+ MetricContext metricContext;
+ public static final String METRICS_SCHEMA_NAME = "schemaName";
+ public static final String METRICS_BYTES_WRITTEN = "bytesWritten";
+ public static final String METRICS_RECORDS_WRITTEN = "recordsWritten";
+ public static final String METRICS_BUFFER_RESIZES = "bufferResizes";
+ public static final String METRICS_BUFFER_SIZE = "bufferSize";
+ public static final String ORC_WRITER_METRICS_NAME = "OrcWriterMetrics";
+
+ public InstrumentedGobblinOrcWriter(FsDataWriterBuilder<Schema,
GenericRecord> builder, State properties) throws IOException {
+ super(builder, properties);
+ metricContext = Instrumented.getMetricContext(new State(properties),
this.getClass());
+ }
+
+ @Override
+ protected synchronized void closeInternal() throws IOException {
+ // close() can be called multiple times by super.commit() and
super.close(), but we only want to emit metrics once
+ if (!this.closed) {
+ this.flush();
+ this.orcFileWriter.close();
+ this.closed = true;
+ log.info("Emitting ORC event metrics");
+ this.metricContext.submitEvent(this.createOrcWriterMetadataEvent());
+ this.recycleRowBatchPool();
+ } else {
+ // Throw fatal exception if there's outstanding buffered data since
there's risk losing data if proceeds.
+ if (rowBatch.size > 0) {
+ throw new CloseBeforeFlushException(this.inputSchema.toString());
+ }
+ }
+ }
+
+ GobblinTrackingEvent createOrcWriterMetadataEvent() throws IOException {
+ GobblinEventBuilder builder = new
GobblinEventBuilder(ORC_WRITER_METRICS_NAME);
Review Comment:
Do you need to include the name space in this event? Also any specific
reason that we don't use eventSubmitter but directly call metricContext to
submit event?
Issue Time Tracking
-------------------
Worklog Id: (was: 875479)
Time Spent: 0.5h (was: 20m)
> ORCWriter should emit events to capture metrics on write side
> -------------------------------------------------------------
>
> Key: GOBBLIN-1869
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1869
> Project: Apache Gobblin
> Issue Type: Improvement
> Components: gobblin-core
> Reporter: William Lo
> Assignee: Abhishek Tiwari
> Priority: Major
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Currently the metrics for streaming mainly relate to the Kafka consumption
> side. We want to also capture metrics through events from the ORCWriter to
> capture the memory usage and internal performance of the writer.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)