[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

liuyongpan (Jira) Mon, 02 Aug 2021 21:19:28 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17391605#comment-17391605
 ]


liuyongpan edited comment on HDFS-16147 at 8/3/21, 4:18 AM:
------------------------------------------------------------

1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows：

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows：

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
    conf = new Configuration();
    //*************add**************
    conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
    conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
            "org.apache.hadoop.io.compress.GzipCodec");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**************add*************
    cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
        .build();
    cluster.waitActive();
    fsn = cluster.getNamesystem();
    hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal     
case INODE: {
          currentStep = new Step(StepType.INODES);
          prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
          stageSubSections = getSubSectionsOfName(
              subSections, SectionName.INODE_SUB);
//          if (loadInParallel && (stageSubSections.size() > 0)) {
//            inodeLoader.loadINodeSectionInParallel(executorService,
//                stageSubSections, summary.getCodec(), prog, currentStep);
//          } else {
//            inodeLoader.loadINodeSection(in, prog, currentStep);
//          }
           inodeLoader.loadINodeSection(in, prog, currentStep);
        }
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage , {color:#172b4d}you can 
get the answer.{color}{color}


was (Author: mofei):
1、HDFS-16147.002.patch fix the error of test unit at 
org.apache.hadoop.hdfs.server.namenode.TestFSImage.testNoParallelSectionsWithCompressionEnabled
 2、Upon careful examination, oiv can indeed work normally, and I can't explain 
why it works.

You can simply verify as follows：

In class TestOfflineImageViewer , method{color:#172b4d} createOriginalFSImage, 
add and remove such code , make a contrast:{color}

{color:#de350b}note:{color} first get my patch  HDFS-16147.002.patch !
{code:java}
// turn on both parallelization and compression
conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
 conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
 "org.apache.hadoop.io.compress.GzipCodec"); 
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "2");
conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "2");
{code}
run test unit   {color:#ffc66d}testPBDelimitedWriter , y{color}ou can get the 
answer.
 3、If I create a parallel compressed image with this patch, and then try to 
load it in a NN without this patch and parallel loading disabled, the NN still 
able to load it.

You can simply verify as follows：

in  class TestFSImageWithSnapshot , method : setUp , add such code:
{code:java}
  public void setUp() throws Exception {
    conf = new Configuration();
    //*************add**************
    conf.setBoolean(DFSConfigKeys.DFS_IMAGE_COMPRESS_KEY, true);
    conf.set(DFSConfigKeys.DFS_IMAGE_COMPRESSION_CODEC_KEY,
            "org.apache.hadoop.io.compress.GzipCodec");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_LOAD_KEY, "true");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_INODE_THRESHOLD_KEY, "3");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_TARGET_SECTIONS_KEY, "3");
    conf.set(DFSConfigKeys.DFS_IMAGE_PARALLEL_THREADS_KEY, "3");
   //**************add*************
    cluster = new MiniDFSCluster.Builder(conf).numDataNodes(REPLICATION)
        .build();
    cluster.waitActive();
    fsn = cluster.getNamesystem();
    hdfs = cluster.getFileSystem();
  }
{code}
In class TestOfflineImageViewer , method createOriginalFSImage, change as 
follow:
{code:java}
class FSImageFormatProtobuf, method loadInternal     
case INODE: {
          currentStep = new Step(StepType.INODES);
          prog.beginStep(Phase.LOADING_FSIMAGE, currentStep);
          stageSubSections = getSubSectionsOfName(
              subSections, SectionName.INODE_SUB);
//          if (loadInParallel && (stageSubSections.size() > 0)) {
//            inodeLoader.loadINodeSectionInParallel(executorService,
//                stageSubSections, summary.getCodec(), prog, currentStep);
//          } else {
//            inodeLoader.loadINodeSection(in, prog, currentStep);
//          }
           inodeLoader.loadINodeSection(in, prog, currentStep);
        }
{code}
 then run test unit  {color:#ffc66d}testSaveLoadImage {color:#172b4d}, you can 
get the answer.{color}{color}

> load fsimage with parallelization and compression
> -------------------------------------------------
>
>                 Key: HDFS-16147
>                 URL: https://issues.apache.org/jira/browse/HDFS-16147
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>    Affects Versions: 3.3.0
>            Reporter: liuyongpan
>            Priority: Minor
>         Attachments: HDFS-16147.001.patch, HDFS-16147.002.patch, 
> subsection.svg
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-16147) load fsimage with parallelization and compression

Reply via email to