jnioche commented on code in PR #1956:
URL: https://github.com/apache/stormcrawler/pull/1956#discussion_r3435907006


##########
core/src/main/java/org/apache/stormcrawler/spout/FileSpout.java:
##########
@@ -61,14 +61,17 @@ public class FileSpout extends BaseRichSpout {
     public static final int BATCH_SIZE = 10000;
     public static final Logger LOG = LoggerFactory.getLogger(FileSpout.class);
     private final Queue<String> inputFiles;
+    private final String seedDir;
+    private final String fileFilter;
+    private final String[] seedFiles;
     protected SpoutOutputCollector collector;

Review Comment:
   might as well mark collector as `transient`



##########
core/src/main/java/org/apache/stormcrawler/spout/FileSpout.java:
##########
@@ -61,14 +61,17 @@ public class FileSpout extends BaseRichSpout {
     public static final int BATCH_SIZE = 10000;
     public static final Logger LOG = LoggerFactory.getLogger(FileSpout.class);
     private final Queue<String> inputFiles;
+    private final String seedDir;
+    private final String fileFilter;
+    private final String[] seedFiles;
     protected SpoutOutputCollector collector;

Review Comment:
   in fact, anything handled in `open()` could be marked as transient
   totalTasks
   taskIndex



##########
core/src/main/java/org/apache/stormcrawler/spout/FileSpout.java:
##########
@@ -115,12 +110,14 @@ public FileSpout(String dir, String filter, boolean 
withDiscoveredStatus) {
      * @since 1.13
      */
     public FileSpout(boolean withDiscoveredStatus, String... files) {
-        this.withDiscoveredStatus = withDiscoveredStatus;
         if (files.length == 0) {
             throw new IllegalArgumentException("Must configure at least one 
inputFile");
         }
-        inputFiles = new LinkedList<>();
-        Collections.addAll(inputFiles, files);
+        this.withDiscoveredStatus = withDiscoveredStatus;
+        this.seedDir = null;
+        this.fileFilter = null;
+        this.seedFiles = files;
+        this.inputFiles = new LinkedList<>();

Review Comment:
   same as above



##########
core/src/main/java/org/apache/stormcrawler/spout/FileSpout.java:
##########
@@ -94,18 +97,10 @@ public FileSpout(String... files) {
      */
     public FileSpout(String dir, String filter, boolean withDiscoveredStatus) {
         this.withDiscoveredStatus = withDiscoveredStatus;
-        Path pdir = Paths.get(dir);
-        inputFiles = new LinkedList<>();
-        LOG.info("Reading directory: {} (filter: {})", pdir, filter);
-        try (DirectoryStream<Path> stream = Files.newDirectoryStream(pdir, 
filter)) {
-            for (Path entry : stream) {
-                String inputFile = entry.toAbsolutePath().toString();
-                inputFiles.add(inputFile);
-                LOG.info("Input : {}", inputFile);
-            }
-        } catch (IOException ioe) {
-            LOG.error("IOException: %s%n", ioe);
-        }
+        this.seedDir = dir;
+        this.fileFilter = filter;
+        this.seedFiles = null;
+        this.inputFiles = new LinkedList<>();

Review Comment:
   no need to instantiate it here - I would  do that in` open()` just before 
the call to `populateInputFiles()`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to