[
https://issues.apache.org/jira/browse/HADOOP-5467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jakob Homan updated HADOOP-5467:
--------------------------------
Attachment: fsimage.xml
HADOOP-5467.patch
Done with first pass at offline image viewer. Still need to do unit tests and
documentation, but looking for feedback.
The offline image viewer will process fsimage files of layout versions -18 or
-19, creating several types of human-readable output. For instance, with the
following (contrived) namespace:
{noformat}
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:17 /anotherDir
-rw-r--r-- 3 jhoman supergroup 286631664 2009-03-16 21:15
/anotherDir/biggerfile
-rw-r--r-- 3 jhoman supergroup 8754 2009-03-16 21:17
/anotherDir/smallFile
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11 /mapredsystem
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11
/mapredsystem/jhoman
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:11
/mapredsystem/jhoman/mapredsystem
drwx-wx-wx - jhoman supergroup 0 2009-03-16 21:11
/mapredsystem/jhoman/mapredsystem/ip.redacted.com
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:12 /one
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:12 /one/two
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:16 /user
drwxr-xr-x - jhoman supergroup 0 2009-03-16 21:19 /user/jhoman
{noformat}
using the default image processor, which mimics the output of ls, generates
this:
{noformat}
[1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i
fsimagedemo
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:16 /
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:17 /anotherDir
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11 /mapredsystem
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:12 /one
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:16 /user
-rw-r--r-- 3 jhoman supergroup 286631664 2009-03-16 14:15
/anotherDir/biggerfile
-rw-r--r-- 3 jhoman supergroup 8754 2009-03-16 14:17
/anotherDir/smallFile
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11
/mapredsystem/jhoman
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:11
/mapredsystem/jhoman/mapredsystem
drwx-wx-wx - jhoman supergroup 0 2009-03-16 14:11
/mapredsystem/jhoman/mapredsystem/ip.redacted.com
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:12 /one/two
drwxr-xr-x - jhoman supergroup 0 2009-03-16 14:19 /user/jhoman
{noformat}
The line ordering is a different, but this output is very amenable to further
processing using standard unix tools and should look familiar to everyone.
Another image processor, Console, displays the namespace in a more verbose
format that includes individual block entries and any inodes that are under
construction in the fsimage:
{noformat}
[1233]mymac:hadoop-0.21.0-dev jhoman$ bin/hadoop offlineimageviewer -i
fsimagedemo -p Console
FSImage
ImageVersion = -19
NamespaceID = 2109123098
GenerationStamp = 1003
INodes [NumInodes = 12]
Inode
INodePath =
Replication = 0
ModificationTime = 2009-03-16 14:16
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = 2147483647
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /anotherDir
Replication = 0
ModificationTime = 2009-03-16 14:17
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /mapredsystem
Replication = 0
ModificationTime = 2009-03-16 14:11
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /one
Replication = 0
ModificationTime = 2009-03-16 14:12
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /user
Replication = 0
ModificationTime = 2009-03-16 14:16
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /anotherDir/biggerfile
Replication = 3
ModificationTime = 2009-03-16 14:15
AccessTime = 2009-03-16 14:15
BlockSize = 134217728
Blocks [NumBlocks = 3]
Block
BlockID = -3825289017228345116
NumBytes = 134217728
GenerationStamp = 1002
Block
BlockID = -561951562131659349
NumBytes = 134217728
GenerationStamp = 1002
Block
BlockID = 524543674153268996
NumBytes = 18196208
GenerationStamp = 1002
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rw-r--r--
Inode
INodePath = /anotherDir/smallFile
Replication = 3
ModificationTime = 2009-03-16 14:17
AccessTime = 2009-03-16 14:17
BlockSize = 134217728
Blocks [NumBlocks = 1]
Block
BlockID = 4922053134320058874
NumBytes = 8754
GenerationStamp = 1003
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rw-r--r--
Inode
INodePath = /mapredsystem/jhoman
Replication = 0
ModificationTime = 2009-03-16 14:11
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /mapredsystem/jhoman/mapredsystem
Replication = 0
ModificationTime = 2009-03-16 14:11
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /mapredsystem/jhoman/mapredsystem/ip-redacted.com
Replication = 0
ModificationTime = 2009-03-16 14:11
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwx-wx-wx
Inode
INodePath = /one/two
Replication = 0
ModificationTime = 2009-03-16 14:12
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
Inode
INodePath = /user/jhoman
Replication = 0
ModificationTime = 2009-03-16 14:19
AccessTime = 1969-12-31 16:00
BlockSize = 0
Blocks [NumBlocks = -1]
NSQuota = -1
DSQuota = -1
Permissions
Username = jhoman
GroupName = supergroup
PermString = rwxr-xr-x
INodesUnderConstruction [NumINodesUnderConstruction = 0]
{noformat}
The final current processor implemented is XML, which generates an XML file of
the entire structure. I've attached the sample output of this. I think this
may be the most interesting because it allows easy automated processing.
However, it's also quite verbose. On a cluster here with about 93k files, the
resulting XML was 2.7 million lines. However, TextMate was able to handle the
output with little grumbling!
One option worth noting is -skipBlocks. In namespaces with a large number of
files that span several blocks, this option causes the individual blocks to be
omitted, including only the block count. Under this namespace distribution
profile, this option will significantly decrease the size of the output.
It should be pretty easy to write new image processors and output formats as
needed. I'll work on testing and documentation and upload a patch soon.
> Create an offline fsimage image viewer
> --------------------------------------
>
> Key: HADOOP-5467
> URL: https://issues.apache.org/jira/browse/HADOOP-5467
> Project: Hadoop Core
> Issue Type: New Feature
> Components: util
> Reporter: Jakob Homan
> Assignee: Jakob Homan
> Attachments: fsimage.xml, HADOOP-5467.patch
>
>
> It would be useful to have a tool to examine/dump the contents of the fsimage
> file to human-readable form. This would allow analysis of the namespace
> (file usage, block sizes, etc) without impacting the operation of the
> namenode. XML would be reasonable output format, as it can be easily viewed,
> compressed and manipulated via either XSLT or XQuery.
> I've started work on this and will have an initial version soon.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.