Here's a few things David.
Regards your tool, you could have just done 'select info:regioninfo from
.META.;' and it would output same data (If you did something like "echo
'select info:regioninfo from .META.;' |./bin/hbase shell --html &>
/tmp/meta.html", the output would be html'ized and easier to read than
an ascii table).
If you want to do merging of regions, check out the main on
org.apache.hadoop.hbase.util.Merge.
Regards offline regions, looking at your report below, all offlined
regions look legit. Their online status is offline but they also have
the split attribute set (On split, the parent is offlined. The daughter
regions take its place. The parent hangs around until such time as the
daughters no longer hold reference to the parent. Then the parent is
deleted).
Regards the 144 missing rows, is it possible you fed your map task
duplicates? The duplicates would increment the map count of inputs
processed but reduce would squash the duplicates together and output a
single row. If you don't have that many rows, perhaps output inputs and
outputs and try to figure where the 144 are going missing?
Regards hbase buckling under load, please send us logs. If you are
using TRUNK, it should be able to easily carry ten concurrent clients
and where it can't, it puts up a gate to block updates. It shouldn't be
falling over.
Thanks D,
St.Ack
David Alves wrote:
Hi Guys
Regarding my previous problems I'm glad to say that I can now crawl an
entire repository with only a small percentage of failed tasks, last
hbase version plus the correction of replication property seemed to
solve it for me.
Still I have two issues I'd appreciate your input in.
The first one regards splits. I've made a small tool (built upon
stack's one) that checks DB state, and can online/offline tables and
merge regions etc. This tool gives me the report ant the end of this
email. The question here Is that I seem to have lost 144 rows (comparing
the output formats output records and the actual rows in the table from
a select count(*)). I suspect these rows are in the offline splits. Can
I use my tool to merge the splits against their online parents using
HRegion.merge() ? Or is it a big no no.
The second issue is more problematic, I misconfigured my last job and
it ran 10 maps instead of the 1 it should, but when under that kind of
load hbase completely failed, regionservers went down, at one time I had
to completely erase the database because it wouldn't start again (I
suspect .META. was offline) the other time I was able to recover all the
data by simply restarting it. Is there any kind of procedure I should
use in this situation?
o
Best Regards
David Alves
Log Trace:
Found region: cyclops-documents-database,,1208892792201
Id: 1208892792201
Start Key:
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW
TO USE HTML 3.2/ch6.htm
Online/Offline
Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW
TO USE HTML 3.2/ch6.htm,1208892792202
Id: 1208892792202
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/HOW
TO USE HTML 3.2/ch6.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX
SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm
Online/Offline Status: ONLINE
Split?: FALSE
DEBUG 23-04 14:54:50,744 (DFSClient.java:readChunk:934) -DFSClient
readChunk got seqno 2 offsetInBlock 8192 lastPacketInBlock false
packetLen 4132
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX
SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm,1208891918491
Id: 1208891918491
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX
SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm
Online/Offline Status: OFFLINE
Split?: TRUE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX
SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm,1208893494772
Id: 1208893494772
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/LINUX
SYSTEM ADMINISTRATOR'S SURVIVAL GUIDE TABLE OF CONTENTS/lsg14.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm
Online/Offline Status: ONLINE
Split?: FALSE
DEBUG 23-04 14:54:50,754 (DFSClient.java:readChunk:934) -DFSClient
readChunk got seqno 3 offsetInBlock 12288 lastPacketInBlock false
packetLen 4132
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm,1208893494773
Id: 1208893494773
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm
Online/Offline Status: OFFLINE
Split?: TRUE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm,1208894034845
Id: 1208894034845
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch14/09.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch40/01.htm
Online/Offline Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch40/01.htm,1208896414707
Id: 1208896414707
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Platinium
Edition Using VB 5/Books/Platinium Edition Using VB 5/ch40/01.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Delphi/Delphi
Informant [1995-2003]/Works/95index.PDF
Online/Offline Status: ONLINE
Split?: FALSE
DEBUG 23-04 14:54:50,756 (DFSClient.java:readChunk:934) -DFSClient
readChunk got seqno 4 offsetInBlock 16384 lastPacketInBlock true
packetLen 3402
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Delphi/Delphi
Informant [1995-2003]/Works/95index.PDF,1208896478277
Id: 1208896478277
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Delphi/Delphi
Informant [1995-2003]/Works/95index.PDF
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Java/java-look-feel-design-guidelines-2nd/HIG.Text3.html
Online/Offline Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Java/java-look-feel-design-guidelines-2nd/HIG.Text3.html,1208896478277
Id: 1208896478277
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/Programming/Java/java-look-feel-design-guidelines-2nd/HIG.Text3.html
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm
Online/Offline Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm,1208891918491
Id: 1208891918491
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION USING MICROSOFT BACKOFFICE, VOLUME 1/ch05/06.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION, USING LOTUS NOTES/e-book/ch25.htm
Online/Offline Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION, USING LOTUS NOTES/e-book/ch25.htm,1208891541773
Id: 1208891541773
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/SPECIAL
EDITION, USING LOTUS NOTES/e-book/ch25.htm
End Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/oreilly-cgionwww/oreilly-cgionwww/ch01_02.txt
Online/Offline Status: ONLINE
Split?: FALSE
Found region:
cyclops-documents-database,smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/oreilly-cgionwww/oreilly-cgionwww/ch01_02.txt,1208891541774
Id: 1208891541774
Start Key:
smb://cbrfileserv.critical.pt/CyclopsRepoLocation-/Critical/Biblioteca-CyclopsRepoLocation/EBooks/oreilly-cgionwww/oreilly-cgionwww/ch01_02.txt
End Key:
Online/Offline Status: ONLINE
Split?: FALSE
Found region: cyclops-links-database,,1208891170959
Id: 1208891170959
Start Key:
End Key:
Online/Offline Status: ONLINE
Split?: FALSE