sodonnel commented on a change in pull request #1028: HDFS-14617 - Improve 
fsimage load time by writing sub-sections to the fsimage index
URL: https://github.com/apache/hadoop/pull/1028#discussion_r313493210
 
 

 ##########
 File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSImageFormatProtobuf.java
 ##########
 @@ -250,6 +257,73 @@ void load(File file) throws IOException {
       }
     }
 
+    /**
+     * Given a FSImage FileSummary.section, return a LimitInput stream set to
+     * the starting position of the section and limited to the section length.
+     * @param section The FileSummary.Section containing the offset and length
+     * @param compressionCodec The compression codec in use, if any
+     * @return An InputStream for the given section
+     * @throws IOException
+     */
+    public InputStream getInputStreamForSection(FileSummary.Section section,
+                                                String compressionCodec)
+        throws IOException {
+      FileInputStream fin = new FileInputStream(filename);
+      FileChannel channel = fin.getChannel();
+      channel.position(section.getOffset());
+      InputStream in = new BufferedInputStream(new LimitInputStream(fin,
+          section.getLength()));
+
+      in = FSImageUtil.wrapInputStreamForCompression(conf,
+          compressionCodec, in);
+      return in;
+    }
+
+    /**
+     * Takes an ArrayList of Section's and removes all Section's whose
+     * name ends in _SUB, indicating they are sub-sections. The original
+     * array list is modified and a new list of the removed Section's is
+     * returned.
+     * @param sections Array List containing all Sections and Sub Sections
+     *                 in the image.
+     * @return ArrayList of the sections removed, or an empty list if none are
+     *         removed.
+     */
+    private ArrayList<FileSummary.Section> getAndRemoveSubSections(
+        ArrayList<FileSummary.Section> sections) {
+      ArrayList<FileSummary.Section> subSections = new ArrayList<>();
+      Iterator<FileSummary.Section> iter = sections.iterator();
+      while (iter.hasNext()) {
+        FileSummary.Section s = iter.next();
+        String name = s.getName();
+        if (name.matches(".*_SUB$")) {
+          subSections.add(s);
+          iter.remove();
+        }
+      }
+      return subSections;
+    }
+
+    /**
+     * Given an ArrayList of Section's, return all Section's with the given
+     * name, or an empty list if none are found.
+     * @param sections ArrayList of the Section's to search though
+     * @param name The name of the Sections to search for
+     * @return ArrayList of the sections matching the given name
+     */
+    private ArrayList<FileSummary.Section> getSubSectionsOfName(
+        ArrayList<FileSummary.Section> sections, SectionName name) {
+      ArrayList<FileSummary.Section> subSec = new ArrayList<>();
+      for (FileSummary.Section s : sections) {
+        String n = s.getName();
+        SectionName sectionName = SectionName.fromString(n);
+        if (sectionName == name) {
+          subSec.add(s);
+        }
+      }
+      return subSec;
+    }
+
 
 Review comment:
   It was a long method already, and this has made it longer. I will see if I 
can find a way to refactor it without introducing too much change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to