[ 
https://issues.apache.org/jira/browse/HBASE-8778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13715608#comment-13715608
 ] 

Dave Latham commented on HBASE-8778:
------------------------------------

I began working on a trunk/0.96 patch that would move the table descriptor 
files to a known sub directory as well as take the refactoring, cleanup and 
documentation from the 0.94.5 patch above but adding a one time migration 
instead of the locking or rolling upgrade support.  One issue I ran into is 
support for snapshots.  The snapshot code calls into FSTableDescriptors to 
write a table descriptor file in the snapshot directory.  How should this work 
when FSTableDescriptors is putting descriptors into a known subdir?

Options I see:
 - Snapshots behave just like actual table directories and put their 
descriptors into a known subdir.  During migration, snapshots are migrated to 
move their descriptor into their known subdir.
 - New snapshots put table descriptors into known subdir.  Reading snapshots 
support finding the table descriptor in the subdir or the orig directory so no 
migration of snapshots are needed.
 - Snapshots continue to store table descriptors directly in the snapshot 
directory and FSTableDescriptors is refactored to share code to write sequenced 
descriptors in any directory.

Thoughts?  I'm leaning toward the last option.
                
> Region assigments scan table directory making them slow for huge tables
> -----------------------------------------------------------------------
>
>                 Key: HBASE-8778
>                 URL: https://issues.apache.org/jira/browse/HBASE-8778
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Dave Latham
>            Assignee: Dave Latham
>             Fix For: 0.98.0, 0.95.2, 0.94.11
>
>         Attachments: 8778-dirmodtime.txt, HBASE-8778-0.94.5.patch, 
> HBASE-8778-0.94.5-v2.patch
>
>
> On a table with 130k regions it takes about 3 seconds for a region server to 
> open a region once it has been assigned.
> Watching the threads for a region server running 0.94.5 that is opening many 
> such regions shows the thread opening the reigon in code like this:
> {noformat}
> "PRI IPC Server handler 4 on 60020" daemon prio=10 tid=0x00002aaac07e9000 
> nid=0x6566 runnable [0x000000004c46d000]
>    java.lang.Thread.State: RUNNABLE
>         at java.lang.String.indexOf(String.java:1521)
>         at java.net.URI$Parser.scan(URI.java:2912)
>         at java.net.URI$Parser.parse(URI.java:3004)
>         at java.net.URI.<init>(URI.java:736)
>         at org.apache.hadoop.fs.Path.initialize(Path.java:145)
>         at org.apache.hadoop.fs.Path.<init>(Path.java:126)
>         at org.apache.hadoop.fs.Path.<init>(Path.java:50)
>         at 
> org.apache.hadoop.hdfs.protocol.HdfsFileStatus.getFullPath(HdfsFileStatus.java:215)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.makeQualified(DistributedFileSystem.java:252)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:311)
>         at 
> org.apache.hadoop.fs.FilterFileSystem.listStatus(FilterFileSystem.java:159)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:842)
>         at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:867)
>         at org.apache.hadoop.hbase.util.FSUtils.listStatus(FSUtils.java:1168)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:269)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoPath(FSTableDescriptors.java:255)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.getTableInfoModtime(FSTableDescriptors.java:368)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:155)
>         at 
> org.apache.hadoop.hbase.util.FSTableDescriptors.get(FSTableDescriptors.java:126)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2834)
>         at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:2807)
>         at sun.reflect.GeneratedMethodAccessor64.invoke(Unknown Source)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at 
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>         at 
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
> {noformat}
> To open the region, the region server first loads the latest 
> HTableDescriptor.  Since HBASE-4553 HTableDescriptor's are stored in the file 
> system at "/hbase/<tableDir>/.tableinfo.<sequenceNum>".  The file with the 
> largest sequenceNum is the current descriptor.  This is done so that the 
> current descirptor is updated atomically.  However, since the filename is not 
> known in advance FSTableDescriptors it has to do a FileSystem.listStatus 
> operation which has to list all files in the directory to find it.  The 
> directory also contains all the region directories, so in our case it has to 
> load 130k FileStatus objects.  Even using a globStatus matching function 
> still transfers all the objects to the client before performing the pattern 
> matching.  Furthermore HDFS uses a default of transferring 1000 directory 
> entries in each RPC call, so it requires 130 roundtrips to the namenode to 
> fetch all the directory entries.
> Consequently, to reassign all the regions of a table (or a constant fraction 
> thereof) requires time proportional to the square of the number of regions.
> In our case, if a region server fails with 200 such regions, it takes 10+ 
> minutes for them all to be reassigned, after the zk expiration and log 
> splitting.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to