hbase will split on row when the start and end row is the same cuase data loss
------------------------------------------------------------------------------
Key: HADOOP-2493
URL: https://issues.apache.org/jira/browse/HADOOP-2493
Project: Hadoop
Issue Type: Bug
Components: contrib/hbase
Reporter: Billy Pearson
Priority: Critical
While testing hbase splits with my code I was loading a table to become a
inverted index on some links
I was using the anchor text as the row key
and the column parent:child as
url:(siteurl) and the data is the count of the links pointing to the siteurl
with row key anchor text.
but a lot of sites have image links and I use "image" as the anchor text for my
testing code so there is a lot of image links.
I changed the max file size of hbase to 16mb for testing and have been able to
recreate the same error.
When the table get big it splits on the column image as the end key for one
table and the start of the next table later it splits to where the start key
and end key was image for one of the splits. After that it keep spiting the
region with start key as "image" and the end key the same. So I have multi
splits with start key and end key as "image" unless the master keeps track of
the row key and partend:child data on the splits I do not thank all the data
will get returned when querying it.
I have attached a screen shot of my regions i thank there should be some logic
to where if the start and end row key is the same the region does not split or
we need to start keeping track of the start key, column data on the master of
each split so we can know where each row is in the database.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.