[ 
https://issues.apache.org/jira/browse/GOBBLIN-1869?focusedWorklogId=875479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-875479
 ]

ASF GitHub Bot logged work on GOBBLIN-1869:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 09/Aug/23 22:31
            Start Date: 09/Aug/23 22:31
    Worklog Time Spent: 10m 
      Work Description: ZihanLi58 commented on code in PR #3732:
URL: https://github.com/apache/gobblin/pull/3732#discussion_r1289293613


##########
gobblin-modules/gobblin-orc/src/main/java/org/apache/gobblin/writer/InstrumentedGobblinOrcWriter.java:
##########
@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.gobblin.writer;
+
+import java.io.IOException;
+import java.util.Map;
+
+import org.apache.avro.Schema;
+import org.apache.avro.generic.GenericRecord;
+
+import com.google.common.collect.Maps;
+
+import lombok.extern.slf4j.Slf4j;
+
+import org.apache.gobblin.configuration.State;
+import org.apache.gobblin.instrumented.Instrumented;
+import org.apache.gobblin.metrics.GobblinTrackingEvent;
+import org.apache.gobblin.metrics.MetricContext;
+import org.apache.gobblin.metrics.event.GobblinEventBuilder;
+
+
+/***
+ * A class for an event emitting GobblinOrcWriter metrics, such as internal 
memory resizing and flushing
+ */
+@Slf4j
+public class InstrumentedGobblinOrcWriter extends GobblinOrcWriter {
+  MetricContext metricContext;
+  public static final String METRICS_SCHEMA_NAME = "schemaName";
+  public static final String METRICS_BYTES_WRITTEN = "bytesWritten";
+  public static final String METRICS_RECORDS_WRITTEN = "recordsWritten";
+  public static final String METRICS_BUFFER_RESIZES = "bufferResizes";
+  public static final String METRICS_BUFFER_SIZE = "bufferSize";
+  public static final String ORC_WRITER_METRICS_NAME = "OrcWriterMetrics";
+
+  public InstrumentedGobblinOrcWriter(FsDataWriterBuilder<Schema, 
GenericRecord> builder, State properties) throws IOException {
+    super(builder, properties);
+    metricContext = Instrumented.getMetricContext(new State(properties), 
this.getClass());
+  }
+
+  @Override
+  protected synchronized void closeInternal() throws IOException {
+    // close() can be called multiple times by super.commit() and 
super.close(), but we only want to emit metrics once
+    if (!this.closed) {
+      this.flush();
+      this.orcFileWriter.close();
+      this.closed = true;
+      log.info("Emitting ORC event metrics");
+      this.metricContext.submitEvent(this.createOrcWriterMetadataEvent());
+      this.recycleRowBatchPool();
+    } else {
+      // Throw fatal exception if there's outstanding buffered data since 
there's risk losing data if proceeds.
+      if (rowBatch.size > 0) {
+        throw new CloseBeforeFlushException(this.inputSchema.toString());
+      }
+    }
+  }
+
+  GobblinTrackingEvent createOrcWriterMetadataEvent() throws IOException {
+    GobblinEventBuilder builder = new 
GobblinEventBuilder(ORC_WRITER_METRICS_NAME);

Review Comment:
   Do you need to include the name space in this event? Also any specific 
reason that we don't use eventSubmitter but directly call metricContext to 
submit event?





Issue Time Tracking
-------------------

    Worklog Id:     (was: 875479)
    Time Spent: 0.5h  (was: 20m)

> ORCWriter should emit events to capture metrics on write side
> -------------------------------------------------------------
>
>                 Key: GOBBLIN-1869
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1869
>             Project: Apache Gobblin
>          Issue Type: Improvement
>          Components: gobblin-core
>            Reporter: William Lo
>            Assignee: Abhishek Tiwari
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Currently the metrics for streaming mainly relate to the Kafka consumption 
> side. We want to also capture metrics through events from the ORCWriter to 
> capture the memory usage and internal performance of the writer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to