Yeah, strange stuff. I Don't see anything strange in the master's logs, it will take awhile to go over the region server logs, but I will certainly do that. We are running version 0.20.3.
On Fri, Apr 16, 2010 at 10:42 AM, Amandeep Khurana <[email protected]> wrote: > What version of HBase are you on? Did you see anything out of place in the > master or regionserver logs? This should be happening...! > > > Amandeep Khurana > Computer Science Graduate Student > University of California, Santa Cruz > > > On Fri, Apr 16, 2010 at 10:27 AM, Charles Glommen <[email protected]> > wrote: > > > For a slightly unrelated reason, I needed to write a quick app to test > some > > code running on our hadoop/hbase cluster. However, I seem to be having > > issues with getting consistent reads. > > > > Here's the scenario: > > > > This application scans some directories in hdfs, and reads lines of text > > from each file. A user ID is extracted from the line, then hbase is > checked > > to see that the ID exists. In *all* cases the ID should exist in hbase. > > However, only about the first 100 or so (of about 1000) return valid > > results. After about 100 reads or so, the rest return null for > > Result.getValue(). You can see from the code that it takes a userID as a > > parameter. This is to illustrate that data is in fact in hbase. > > Setting*any* > > of the userIDs that produced null results as a parameter will result in a > > valid hbase read. Here is an abbreviated output that illustrates this > > oddity: > > > > First execution of application: > > ...(many 'good' output lines, like the following 2) > > bytes for user 139|754436243196115533|c: 1920 > > bytes for user 139|754436243113796511|c: 1059 > > bytes for user 141|754999187733044577|c: 0 > > 1/171 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fiqgvrl.events > > bytes for user *141|754717712663942409|c*: 0 > > 2/172 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T00-0700/fwesvqn.events > > bytes for user 141|755280633926232247|c: 0 > > 3/173 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/wydfvn.events > > bytes for user 141|754436237930862231|c: 0 > > 4/174 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T01-0700/zpjyod.events > > byte > > > > ...and this continues for the remaining files. > > > > Second execution with *any* of the seemingly missing userIDs yields the > > following sample: > > > > Count bytes for commandline user 141|754717712663942409|c: 855 > > ...(many 'good' output lines, like the following 1) > > bytes for user 141|qfbvndelauretis|a: 2907001 > > bytes for user 141|754436240987076893|c: 0 > > 1/208 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T14-0700/hehvln.events > > bytes for user 141|754436241315533944|c: 0 > > bytes for user 141|754436241215573999|c: 0 > > 2/210 FILE MAY HAVE LINE MISSING FROM HBASE!: > > > > > hdfs://elh00/user/hadoop/events/siteID-141/2010-04-12T15-0700/fvkeert.events > > ... > > > > Notice that the 'zeros' don't occur until file 208 this time. This is not > > random either, rerunning the above two will produce the exact same > results, > > all day long. Its as if selecting the initial user allows its region to > be > > read more consistently for the remainder of the run. Three last points: > No > > exceptions are ever thrown, all region servers are up throughout the > > execution, and no other reads or writes are occurring on the cluster > during > > the execution. > > > > Any thoughts of advice? This is really causing me pain at the moment. > > > > Oh, and here's the quick and dirty class that produces this: > > > > package com.touchcommerce.data.jobs.misc.partitioning_debug; > > > > import java.io.BufferedReader; > > import java.io.IOException; > > import java.io.InputStreamReader; > > > > import org.apache.hadoop.fs.FileStatus; > > import org.apache.hadoop.fs.FileSystem; > > import org.apache.hadoop.fs.Path; > > import org.apache.hadoop.hbase.HBaseConfiguration; > > import org.apache.hadoop.hbase.client.Get; > > import org.apache.hadoop.hbase.client.HTable; > > import org.apache.hadoop.hbase.client.Result; > > import org.apache.hadoop.hbase.util.Bytes; > > > > import com.touchcommerce.data.Constants; > > import com.touchcommerce.data.services.resources.HDFSService; > > import com.touchcommerce.data.services.utils.EventUtils; > > > > public class TestIt { > > > > private final static HBaseConfiguration config = new > > HBaseConfiguration(HDFSService.newConfigInstance()); > > private static String userToCheckFirst; > > private static HTable userEventsTable; > > public static void main(String[] args) throws IOException { > > FileSystem hdfs = FileSystem.get(config); > > userEventsTable = new HTable(config, Constants.HBASE_USER_EVENTS_TABLE); > > int maxLinesPerFileToRead = Integer.parseInt(args[0]); > > FileStatus[] containedSiteEntries = hdfs.listStatus(new > > Path(Constants.HDFS_EVENTS_ROOT_DIR)); > > int good = 0; > > int bad = 0; > > /** > > * Passing in a key here that returned no data during the loop below will > > almost certainly result in event data, > > * meaning that hbase *does* have data for this key after all. So what's > > wrong with the loop below?????? > > */ > > userToCheckFirst = args.length > 1 ? args[1] : null; > > if (userToCheckFirst != null) { > > byte[] data = fetchData(Bytes.toBytes(userToCheckFirst)); > > System.out.println("Count bytes for commandline user " + userToCheckFirst > + > > ": " + (data == null ? 0 : data.length)); > > } > > for (FileStatus siteStatus : containedSiteEntries) { > > if (siteStatus.isDir()) { > > FileStatus[] containedHourEntries = > hdfs.listStatus(siteStatus.getPath()); > > for (FileStatus hourStatus : containedHourEntries) { > > String hourStatusPath = hourStatus.getPath().toString(); > > if (hourStatus.isDir() && > > hourStatusPath.indexOf(Constants.HDFS_INVALID_EVENTS_DIR) < 0 > > && (hourStatusPath.indexOf("2010-04-12") > 0)) { > > FileStatus[] containedHourFiles = hdfs.listStatus(hourStatus.getPath()); > > for (FileStatus hourFile : containedHourFiles) { > > if (hourFile.getLen() > 0) { > > Path hourFilePath = hourFile.getPath(); > > boolean containedUser = false; > > BufferedReader in = new BufferedReader(new > > InputStreamReader(hdfs.open(hourFilePath))); > > boolean fileIsGood = false; > > String line = in.readLine(); > > boolean processMoreLines = line != null; > > int linesRead = line == null ? 0 : 1; > > while (processMoreLines) { > > byte[] data = null; > > String siteID = EventUtils.extractField(line, "siteID"); > > String userID = EventUtils.extractCustomerID(line); > > String type = "c"; > > if (userID == null) { > > userID = EventUtils.extractAgentID(line); > > type = "a"; > > } > > if (userID != null) { > > containedUser = true; > > int attempts = 0; > > while (data == null || data.length == 0) { > > data = fetchData(Bytes.toBytes(siteID + "|" + userID + "|" + type)); > > if (data == null || data.length == 0) { > > //THIS SHOULD NOT HAPPEN, BUT ONCE IT DOES, THE REST SEEM TO FOLLOW WITH > > THE SAME > > } > > else { > > System.out.println("bytes for user " + siteID + "|" + userID + "|" + type > + > > ": " + (data == null ? 0 : data.length)); > > break; > > } > > if (++attempts == 3) { > > System.out.println("bytes for user " + siteID + "|" + userID + "|" + > type > > + > > ": " + (data == null ? 0 : data.length)); > > break; > > } > > } > > } > > if (data != null && data.length > 0) { > > fileIsGood = true; > > processMoreLines = false; > > } > > if (linesRead >= maxLinesPerFileToRead) > > processMoreLines = false; > > else { > > line = in.readLine(); > > processMoreLines = line != null; > > linesRead++; > > } > > } > > in.close(); > > if (fileIsGood || !containedUser) > > good++; > > else { > > bad++; > > System.out.println(bad +"/" + (good+bad) + " FILE MAY HAVE LINE MISSING > > FROM > > HBASE!: " + hourFilePath); > > } > > } > > } > > } > > } > > } > > } > > } > > > > private static byte[] fetchData(byte[] userBytes) throws IOException { > > Get userEventsQuery = new Get(userBytes); > > Result row = userEventsTable.get(userEventsQuery); > > if(row == null) > > return null; > > return row.getValue(Constants.HBASE_USER_EVENTS_TABLE_RAW_FAMILY, > > Constants.HBASE_USER_EVENTS_TABLE_EVENTS_COLUMN); > > } > > } > > >
