[
https://issues.apache.org/jira/browse/GOBBLIN-1684?focusedWorklogId=809239&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-809239
]
ASF GitHub Bot logged work on GOBBLIN-1684:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 15/Sep/22 18:07
Start Date: 15/Sep/22 18:07
Worklog Time Spent: 10m
Work Description: homatthew commented on code in PR #3540:
URL: https://github.com/apache/gobblin/pull/3540#discussion_r972289262
##########
gobblin-runtime/src/main/java/org/apache/gobblin/runtime/messaging/hdfs/FileSystemMessageBuffer.java:
##########
@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.gobblin.runtime.messaging.hdfs;
+
+import com.typesafe.config.Config;
+import java.io.IOException;
+import java.time.Duration;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ScheduledExecutorService;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import lombok.Getter;
+import org.apache.commons.collections4.CollectionUtils;
+import org.apache.gobblin.runtime.messaging.MessageBuffer;
+import org.apache.gobblin.runtime.messaging.data.DynamicWorkUnitMessage;
+import org.apache.gobblin.runtime.messaging.data.DynamicWorkUnitSerde;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+
+
+/**
+ * Implements {@link FileSystem} based message buffer for sending and
receiving {@link DynamicWorkUnitMessage}.
+ */
+public class FileSystemMessageBuffer implements
MessageBuffer<DynamicWorkUnitMessage> {
+ @Getter
+ private String channelName;
+ private boolean isPollingForMessages = false;
+ private FileSystem fs;
+ private Path dir;
+ private Duration pollingRate;
+ private final List<Consumer<List<DynamicWorkUnitMessage>>> subscribers = new
ArrayList<>();
+
+ public FileSystemMessageBuffer(FileSystem fs, String channelName, Duration
pollingRate, Path parentWorkingDir) {
+ this.fs = fs;
+ this.channelName = channelName;
+ this.pollingRate = pollingRate;
+ this.dir = Path.mergePaths(parentWorkingDir, new Path(channelName));
+ }
+
+ @Override
+ public void publish(DynamicWorkUnitMessage item) throws IOException {
+ byte[] serializedMsg = DynamicWorkUnitSerde.serialize(item);
+ persistMessage(serializedMsg);
+ }
+
+ @Override
+ public void subscribe(Consumer<List<DynamicWorkUnitMessage>> subscriber) {
+ subscribers.add(subscriber);
+
+ if (!isPollingForMessages) {
+ startPollingForNewMessages(dir);
+ isPollingForMessages = true;
+ }
+ }
+
+ private void startPollingForNewMessages(Path dirToPoll) {
+ Factory.getExecutorInstance().scheduleAtFixedRate(() -> {
+ List<DynamicWorkUnitMessage> newMessages = this.getNewMessages();
+ if (!CollectionUtils.isEmpty(newMessages)) {
+ subscribers.forEach(s -> s.accept(newMessages));
+ }
+ },
+ 0, pollingRate.toMillis(), TimeUnit.MILLISECONDS);
+ }
+
+ private List<DynamicWorkUnitMessage> getNewMessages() {
Review Comment:
Actually I think passing it here provides better readability instead of
class variable. I'll address in next iteration
Issue Time Tracking
-------------------
Worklog Id: (was: 809239)
Time Spent: 1h 10m (was: 1h)
> Interface / Stub for HDFS Writer for Dynamic Work Unit Allocation
> Communication
> -------------------------------------------------------------------------------
>
> Key: GOBBLIN-1684
> URL: https://issues.apache.org/jira/browse/GOBBLIN-1684
> Project: Apache Gobblin
> Issue Type: New Feature
> Reporter: Matthew Ho
> Priority: Major
> Time Spent: 1h 10m
> Remaining Estimate: 0h
>
> As part of the initial implementation for Helix Dynamic Workunit Allocation,
> the application master and task runners will communicate via HDFS by writing
> to a path and then the task runner needs to poll that path for new files
--
This message was sent by Atlassian Jira
(v8.20.10#820010)