[ 
https://issues.apache.org/jira/browse/HADOOP-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakob Homan updated HADOOP-5467:
--------------------------------

    Attachment: fsimage.xml
                HADOOP-5467.patch

Done with first pass at offline image viewer.  Still need to do unit tests and 
documentation, but looking for feedback.  

The offline image viewer will process fsimage files of layout versions -18 or 
-19, creating several types of human-readable output.  For instance, with the 
following (contrived) namespace:
{noformat}
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:17 /anotherDir
-rw-r--r--   3 jhoman supergroup  286631664 2009-03-16 21:15 
/anotherDir/biggerfile
-rw-r--r--   3 jhoman supergroup       8754 2009-03-16 21:17 
/anotherDir/smallFile
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 /mapredsystem
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 
/mapredsystem/jhoman
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:11 
/mapredsystem/jhoman/mapredsystem
drwx-wx-wx   - jhoman supergroup          0 2009-03-16 21:11 
/mapredsystem/jhoman/mapredsystem/ip.redacted.com
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:12 /one
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:12 /one/two
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:16 /user
drwxr-xr-x   - jhoman supergroup          0 2009-03-16 21:19 /user/jhoman
{noformat}

using the default image processor, which mimics the output of ls, generates 
this:
{noformat}
[1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i 
fsimagedemo 
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:16 /
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:17 /anotherDir
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 /mapredsystem
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:12 /one
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:16 /user
-rw-r--r--  3   jhoman supergroup    286631664 2009-03-16 14:15 
/anotherDir/biggerfile
-rw-r--r--  3   jhoman supergroup         8754 2009-03-16 14:17 
/anotherDir/smallFile
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 
/mapredsystem/jhoman
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:11 
/mapredsystem/jhoman/mapredsystem
drwx-wx-wx  -   jhoman supergroup            0 2009-03-16 14:11 
/mapredsystem/jhoman/mapredsystem/ip.redacted.com
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:12 /one/two
drwxr-xr-x  -   jhoman supergroup            0 2009-03-16 14:19 /user/jhoman
{noformat}
The line ordering is a different, but this output is very amenable to further 
processing using standard unix tools and should look familiar to everyone.

Another image processor, Console, displays the namespace in a more verbose 
format that includes individual block entries and any inodes that are under 
construction in the fsimage:
{noformat}
[1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i 
fsimagedemo -p Console
FSImage
  ImageVersion = -19
  NamespaceID = 2109123098
  GenerationStamp = 1003
  INodes [NumInodes = 12]
    Inode
      INodePath = 
      Replication = 0
      ModificationTime = 2009-03-16 14:16
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = 2147483647
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /anotherDir
      Replication = 0
      ModificationTime = 2009-03-16 14:17
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /mapredsystem
      Replication = 0
      ModificationTime = 2009-03-16 14:11
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /one
      Replication = 0
      ModificationTime = 2009-03-16 14:12
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /user
      Replication = 0
      ModificationTime = 2009-03-16 14:16
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /anotherDir/biggerfile
      Replication = 3
      ModificationTime = 2009-03-16 14:15
      AccessTime = 2009-03-16 14:15
      BlockSize = 134217728
      Blocks [NumBlocks = 3]
        Block
          BlockID = -3825289017228345116
          NumBytes = 134217728
          GenerationStamp = 1002
        Block
          BlockID = -561951562131659349
          NumBytes = 134217728
          GenerationStamp = 1002
        Block
          BlockID = 524543674153268996
          NumBytes = 18196208
          GenerationStamp = 1002
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rw-r--r--
    Inode
      INodePath = /anotherDir/smallFile
      Replication = 3
      ModificationTime = 2009-03-16 14:17
      AccessTime = 2009-03-16 14:17
      BlockSize = 134217728
      Blocks [NumBlocks = 1]
        Block
          BlockID = 4922053134320058874
          NumBytes = 8754
          GenerationStamp = 1003
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rw-r--r--
    Inode
      INodePath = /mapredsystem/jhoman
      Replication = 0
      ModificationTime = 2009-03-16 14:11
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /mapredsystem/jhoman/mapredsystem
      Replication = 0
      ModificationTime = 2009-03-16 14:11
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /mapredsystem/jhoman/mapredsystem/ip-redacted.com
      Replication = 0
      ModificationTime = 2009-03-16 14:11
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwx-wx-wx
    Inode
      INodePath = /one/two
      Replication = 0
      ModificationTime = 2009-03-16 14:12
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
    Inode
      INodePath = /user/jhoman
      Replication = 0
      ModificationTime = 2009-03-16 14:19
      AccessTime = 1969-12-31 16:00
      BlockSize = 0
      Blocks [NumBlocks = -1]
      NSQuota = -1
      DSQuota = -1
      Permissions
        Username = jhoman
        GroupName = supergroup
        PermString = rwxr-xr-x
  INodesUnderConstruction [NumINodesUnderConstruction = 0]
{noformat}

The final current processor implemented is XML, which generates an XML file of 
the entire structure.  I've attached the sample output of this.  I think this 
may be the most interesting because it allows easy automated processing.  
However, it's also quite verbose.  On a cluster here with about 93k files, the 
resulting XML was 2.7 million lines.  However, TextMate was able to handle the 
output with little grumbling!

One option worth noting is -skipBlocks.  In namespaces with a large number of 
files that span several blocks, this option causes the individual blocks to be 
omitted, including only the block count.  Under this namespace distribution 
profile, this option will significantly decrease the size of the output.

It should be pretty easy to write new image processors and output formats as 
needed.  I'll work on testing and documentation and upload a patch soon.

> Create an offline fsimage image viewer
> --------------------------------------
>
>                 Key: HADOOP-5467
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5467
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: util
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: fsimage.xml, HADOOP-5467.patch
>
>
> It would be useful to have a tool to examine/dump the contents of the fsimage 
> file to human-readable form.  This would allow analysis of the namespace 
> (file usage, block sizes, etc) without impacting the operation of the 
> namenode.  XML would be reasonable output format, as it can be easily viewed, 
> compressed and manipulated via either XSLT or XQuery.  
> I've started work on this and will have an initial version soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to