[ 
https://issues.apache.org/jira/browse/GOBBLIN-1668?focusedWorklogId=791085&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-791085
 ]

ASF GitHub Bot logged work on GOBBLIN-1668:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 14/Jul/22 18:15
            Start Date: 14/Jul/22 18:15
    Worklog Time Spent: 10m 
      Work Description: vikrambohra commented on code in PR #3527:
URL: https://github.com/apache/gobblin/pull/3527#discussion_r921443301


##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/writer/IcebergMetadataWriter.java:
##########
@@ -792,6 +800,9 @@ public void flush(String dbName, String tableName) throws 
IOException {
         String topic = props.get(TOPIC_NAME_KEY);
         if (tableMetadata.appendFiles.isPresent()) {
           tableMetadata.appendFiles.get().commit();
+          if (auditWhitelistBlacklist.acceptTable(dbName, tableName)) {

Review Comment:
   There is a chance of optimization here. We should check if a table should be 
audited when collecting the counts rather than publish time. 



##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/writer/IcebergMetadataWriter.java:
##########
@@ -637,6 +644,7 @@ protected void addFiles(GobblinMetadataChangeEvent gmce, 
Map<String, Collection<
   private Stream<DataFile> 
getIcebergDataFilesToBeAddedHelper(GobblinMetadataChangeEvent gmce, Table table,
       Map<String, Collection<HiveSpec>> newSpecsMap,
       TableMetadata tableMetadata) {
+    tableMetadata.serializedAuditCountMaps.add(gmce.getAuditCountMap());

Review Comment:
   See my below comment about collecting audit counts only if the table is 
whitelisted



##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/GobblinMCEProducer.java:
##########
@@ -94,8 +94,13 @@ public GobblinMCEProducer(State state) {
    */
   public void sendGMCE(Map<Path, Metrics> newFiles, List<String> oldFiles, 
List<String> oldFilePrefixes,
       Map<String, String> offsetRange, OperationType operationType, 
SchemaSource schemaSource) throws IOException {
+    sendGMCE(newFiles, oldFiles, oldFilePrefixes, offsetRange, operationType, 
schemaSource, null);
+  }
+
+  public void sendGMCE(Map<Path, Metrics> newFiles, List<String> oldFiles, 
List<String> oldFilePrefixes,
+      Map<String, String> offsetRange, OperationType operationType, 
SchemaSource schemaSource, String serializedAuditMap) throws IOException {

Review Comment:
   serializedAuditCountMap?



##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/GobblinMCEProducer.java:
##########
@@ -166,7 +171,7 @@ private void 
setBasicInformationForGMCE(GobblinMetadataChangeEvent.Builder gmceB
 
   public GobblinMetadataChangeEvent getGobblinMetadataChangeEvent(Map<Path, 
Metrics> newFiles, List<String> oldFiles,
       List<String> oldFilePrefixes, Map<String, String> offsetRange, 
OperationType operationType,
-      SchemaSource schemaSource) {
+      SchemaSource schemaSource, String serializedAuditMap) {

Review Comment:
   serializedAuditCountMap?



##########
gobblin-iceberg/src/main/java/org/apache/gobblin/iceberg/GobblinMCEProducer.java:
##########
@@ -94,8 +94,13 @@ public GobblinMCEProducer(State state) {
    */
   public void sendGMCE(Map<Path, Metrics> newFiles, List<String> oldFiles, 
List<String> oldFilePrefixes,
       Map<String, String> offsetRange, OperationType operationType, 
SchemaSource schemaSource) throws IOException {
+    sendGMCE(newFiles, oldFiles, oldFilePrefixes, offsetRange, operationType, 
schemaSource, null);
+  }
+

Review Comment:
   Add javadoc with the new parameter





Issue Time Tracking
-------------------

    Worklog Id:     (was: 791085)
    Time Spent: 40m  (was: 0.5h)

> Add audit counts for iceberg registration
> -----------------------------------------
>
>                 Key: GOBBLIN-1668
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1668
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Jack Moseley
>            Priority: Major
>          Time Spent: 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to