[
https://issues.apache.org/jira/browse/HBASE-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-2461:
-------------------------
Attachment: 2461.txt
This issue highlights how exceptions post close of the region-to-be-split -- a
necessary action if the split is to come out clean -- can poke a hole in an
online table.
This patch starts down a road of treating the split operation inside in the
regionserver as a 'transaction'. There is a prepare step and an execute step.
Should the execute fail -- execute step has stuff like close of region, update
of meta table with new split codes -- then we'll call rollback. The rollback
will try and fixup the failed split by doing things like reopening region if
appropriate and fixing up meta if necessary.
If the rollback fails, we'll kill the regionserver so that the processing of
the server shutdown gets the effected regions back on line again.
Patch is not ready yet.
> Split doesn't handle IOExceptions when creating new region reference files
> --------------------------------------------------------------------------
>
> Key: HBASE-2461
> URL: https://issues.apache.org/jira/browse/HBASE-2461
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Todd Lipcon
> Assignee: stack
> Priority: Blocker
> Fix For: 0.21.0
>
> Attachments: 2461.txt
>
>
> I was testing an HDFS patch which had a bug in it, so it happened to throw an
> NPE during a split with the following trace:
> 2010-04-16 19:18:20,727 ERROR
> org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction failed
> for region TestTable,-1945465867<1271449232310>,1271453785648
> java.lang.NullPointerException
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.enqueueCurrentPacket(DFSClient.java:3124)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.flushInternal(DFSClient.java:3220)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.closeInternal(DFSClient.java:3306)
> at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.close(DFSClient.java:3255)
> at
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
> at
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
> at org.apache.hadoop.fs.FileSystem.createNewFile(FileSystem.java:560)
> at org.apache.hadoop.hbase.util.FSUtils.create(FSUtils.java:95)
> at org.apache.hadoop.hbase.io.Reference.write(Reference.java:129)
> at
> org.apache.hadoop.hbase.regionserver.StoreFile.split(StoreFile.java:498)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.splitRegion(HRegion.java:682)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.split(CompactSplitThread.java:162)
> at
> org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:95)
> After that, my region was gone, any further writes to it would fail.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.