[Hadoop Wiki] Update of udanax by udanax

2008-01-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Hadoop Wiki for change 
notification.

The following page has been changed by udanax:
http://wiki.apache.org/hadoop/udanax

The comment on the change is:
Mail has changed.

--
  === Profile ===
   * Who : Edward J. Yoon AT NHN, corp. 
   * Master of mathematics.
-  * E-mail : [mailto:[EMAIL PROTECTED] webmaster AT SPAMFREE udanax DOT org]
+  * E-mail : [mailto:[EMAIL PROTECTED] edward AT SPAMFREE mail DOT udanax DOT 
org]
   * [http://www.udanax.org/]
  === Contributes ===
   * [:Hbase/HbaseShell: Hbase Shell  HQL]
-  * [:HRDF: Hadoop based RDF Store architecture]
+  * [:HRDF: Hadoop + Hbase based RDF Store architecture]
   * [:NewsPersonalizationSystem: Hadoop + Hbase based News Personalization 
System(Google clone)]
  


[Hadoop Wiki] Trivial Update of Hbase by stack

2008-01-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Hadoop Wiki for change 
notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase

--
  = Bigtable-like structured storage for Hadoop HDFS =
  
  [[Anchor(links)]]
- == Project Links ==
   * [#news News]
   * [#background Background]
   * [wiki:Hbase/HbaseArchitecture  Hbase Architecture]


[Hadoop Wiki] Trivial Update of Hbase/FAQ by stack

2008-01-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Hadoop Wiki for change 
notification.

The following page has been changed by stack:
http://wiki.apache.org/hadoop/Hbase/FAQ

The comment on the change is:
Add how to access hbase from non-java languages

--
  
   1. [#1 Can someone give an example of basic API-usage going against hbase?]
   1. [#2 What other hbase-like applications are there out there?]
-  1. [#3 Can I fix O!utOfMemoryExceptions in hbase?]
+  1. [#3 Can I fix OutOfMemoryExceptions in hbase?]
   1. [#4 How do I enable hbase DEBUG-level logging?]
   1. [#5 Why do I see java.io.IOException...(Too many open files) in my 
logs?]
   1. [#6 What can I do to improve hbase performance?]
+  1. [#7 How do I access Hbase from my Ruby/Python/Perl/PHP/etc. application?]
  
  == Answers ==
  
@@ -55, +56 @@

   * [wiki:Hbase/PNUTS PNUTS], a Platform for Nimble Universal Table Storage, 
being developed internally at Yahoo!
   * [http://www.amazon.com/gp/browse.html?node=342335011 Amazon SimpleDB] is a 
web service for running queries on structured data in real time.
  
- '''3. [[Anchor(3)]] Can I fix O!utOfMemoryExceptions in hbase?'''
+ '''3. [[Anchor(3)]] Can I fix OutOfMemoryExceptions in hbase?'''
  
  Out-of-the-box, hbase uses the default JVM heap size.  Set the 
''HBASE_HEAPSIZE'' environment variable in ''${HBASE_HOME}/conf/hbase-env.sh'' 
if your install needs to run with a larger heap.  ''HBASE_HEAPSIZE'' is like 
''HADOOP_HEAPSIZE'' in that its value is the desired heap size in MB.  The 
surrounding '-Xmx' and 'm' needed to make up the maximum heap size java option 
are added by the hbase start script (See how ''HBASE_HEAPSIZE'' is used in the 
''${HBASE_HOME}/bin/hbase'' script for clarification).
  
@@ -65, +66 @@

  
  '''5. [[Anchor(5)]] Why do I see java.io.IOException...(Too many open 
files) in my logs?'''
  
- Running an Hbase loaded w/ more than a few regions, its possible to blow past 
the environment file handle limit for the user running the process.  Running 
out of file handles is like an OOME, things start to fail in strange ways.  To 
up the users' file handles, edit '''/etc/security/limits.conf''' on all nodes 
and restart your cluster.
+ Currently Hbase is a file handle glutton.  Running an Hbase loaded w/ more 
than a few regions, its possible to blow past the common 1024 default file 
handle limit for the user running the process.  Running out of file handles is 
like an OOME, things start to fail in strange ways.  To up the users' file 
handles, edit '''/etc/security/limits.conf''' on all nodes and restart your 
cluster.
  
  '''6. [[Anchor(6)]] What can I do to improve hbase performance?'''
  
- To improve random-read performance, if you can, try making the hdfs block 
size smaller (as is suggested in the bigtable paper).  By default its 64MB.  
Try setting it to 8MB.  On every random read, hbase has to fetch from hdfs the 
blocks that contain the wanted row.  If your rows are small, much smaller than 
the hdfs block size, then we'll be fetching a lot of data only to discard the 
bulk.  Meantime the big block fetches and processing consume CPU, network, etc. 
in the datanodes and hbase client.
- 
- Another configuration that can help with random reads at some cost in memory 
is making the '''hbase.io.index.interval''' smaller.  By default when hbase 
writes store files, it adds an entry to the mapfile index on every 32nd 
addition (For hadoop, default is every 128th addition).  Adding entries more 
frequently -- every 16th or every 8th -- will make it so there is less seeking 
around looking for the wanted entry but at the cost of a hbase carrying a 
larger index (Indices are read into memory on mapfile open; by default there 
are one to five or so mapfiles per column family per region loaded into a 
regionserver).
+ A configuration that can help with random reads at some cost in memory is 
making the '''hbase.io.index.interval''' smaller.  By default when hbase writes 
store files, it adds an entry to the mapfile index on every 32nd addition (For 
hadoop, default is every 128th addition).  Adding entries more frequently -- 
every 16th or every 8th -- will make it so there is less seeking around looking 
for the wanted entry but at the cost of a hbase carrying a larger index 
(Indices are read into memory on mapfile open; by default there are one to five 
or so mapfiles per column family per region loaded into a regionserver).
  
  Some basic tests making the '''io.bytes.per.checksum''' larger -- changing it 
from checksum-checking every 4096 bytes instead of every 512 bytes -- seem to 
have no discernible effect on performance.
  
+ 
+ '''7. [[Anchor(7)]] How do I access Hbase from my Ruby/Python/Perl/PHP/etc. 
application?'''
+ 
+  * [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/javadoc/ 
Description of how to launch a thrift service, client bindings and 

svn commit: r613446 - in /lucene/hadoop/trunk/src/contrib/hbase: ./ src/java/org/apache/hadoop/hbase/ src/java/org/apache/hadoop/hbase/util/ src/test/org/apache/hadoop/hbase/

2008-01-19 Thread jimk
Author: jimk
Date: Sat Jan 19 12:20:15 2008
New Revision: 613446

URL: http://svn.apache.org/viewvc?rev=613446view=rev
Log:
HADOOP-2643 Make migration tool smarter.

Modified:
lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt

lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java

lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java

lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/FSUtils.java

lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/Migrate.java

lucene/hadoop/trunk/src/contrib/hbase/src/test/org/apache/hadoop/hbase/MiniHBaseCluster.java

Modified: lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt?rev=613446r1=613445r2=613446view=diff
==
--- lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt (original)
+++ lucene/hadoop/trunk/src/contrib/hbase/CHANGES.txt Sat Jan 19 12:20:15 2008
@@ -201,7 +201,8 @@
HMaster shutdown
HADOOP-2616 hbase not spliting when the total size of region reaches max
region size * 1.5
-
+   HADOOP-2643 Make migration tool smarter.
+   
 Release 0.15.1
 Branch 0.15
 

Modified: 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java?rev=613446r1=613445r2=613446view=diff
==
--- 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
 (original)
+++ 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HConstants.java
 Sat Jan 19 12:20:15 2008
@@ -26,6 +26,14 @@
  */
 public interface HConstants {
   
+  // For migration
+
+  /** name of version file */
+  static final String VERSION_FILE_NAME = hbase.version;
+  
+  /** version of file system */
+  static final String FILE_SYSTEM_VERSION = 0.1;
+  
   // Configuration parameters
   
   // TODO: URL for hbase master like hdfs URLs with host and port.

Modified: 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java?rev=613446r1=613445r2=613446view=diff
==
--- 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
 (original)
+++ 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/HMaster.java
 Sat Jan 19 12:20:15 2008
@@ -889,6 +889,10 @@
   // Make sure the root directory exists!
   if(! fs.exists(rootdir)) {
 fs.mkdirs(rootdir);
+FSUtils.setVersion(fs, rootdir);
+  } else if (!FSUtils.checkVersion(fs, rootdir)) {
+throw new IOException(
+file system not correct version. Run hbase.util.Migrate);
   }
 
   if (!fs.exists(rootRegionDir)) {

Modified: 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/FSUtils.java
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/FSUtils.java?rev=613446r1=613445r2=613446view=diff
==
--- 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/FSUtils.java
 (original)
+++ 
lucene/hadoop/trunk/src/contrib/hbase/src/java/org/apache/hadoop/hbase/util/FSUtils.java
 Sat Jan 19 12:20:15 2008
@@ -19,12 +19,16 @@
  */
 package org.apache.hadoop.hbase.util;
 
+import java.io.DataInputStream;
 import java.io.IOException;
 
 import org.apache.commons.logging.Log;
 import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FSDataOutputStream;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hbase.HConstants;
 import org.apache.hadoop.dfs.DistributedFileSystem;
 
 /**
@@ -71,4 +75,40 @@
 }
 return available;
   }
+  
+  /**
+   * Verifies current version of file system
+   * 
+   * @param fs
+   * @param rootdir
+   * @return true if the current file system is the correct version
+   * @throws IOException
+   */
+  public static boolean checkVersion(FileSystem fs, Path rootdir) throws 
IOException {
+Path versionFile = new Path(rootdir, HConstants.VERSION_FILE_NAME);
+boolean versionOk = false;
+if (fs.exists(versionFile)) {
+  FSDataInputStream s =
+fs.open(new Path(rootdir, HConstants.VERSION_FILE_NAME));
+  String version = DataInputStream.readUTF(s);
+  s.close();
+  versionOk = version.compareTo(HConstants.FILE_SYSTEM_VERSION) == 0;
+   

svn commit: r613499 - in /lucene/hadoop/trunk: ./ src/java/org/apache/hadoop/conf/ src/java/org/apache/hadoop/mapred/ src/test/org/apache/hadoop/conf/

2008-01-19 Thread cdouglas
Author: cdouglas
Date: Sat Jan 19 18:39:10 2008
New Revision: 613499

URL: http://svn.apache.org/viewvc?rev=613499view=rev
Log:
HADOOP-2367. Add ability to profile a subset of map/reduce tasks and fetch the
result to the local filesystem of the submitting application. Also includes a
general IntegerRanges extension to Configuration for setting positive, ranged
parameters. Contributed by Owen O'Malley.


Modified:
lucene/hadoop/trunk/CHANGES.txt
lucene/hadoop/trunk/src/java/org/apache/hadoop/conf/Configuration.java
lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/JobClient.java
lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/JobConf.java
lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskLog.java
lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskLogServlet.java
lucene/hadoop/trunk/src/java/org/apache/hadoop/mapred/TaskRunner.java
lucene/hadoop/trunk/src/test/org/apache/hadoop/conf/TestConfiguration.java

Modified: lucene/hadoop/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/CHANGES.txt?rev=613499r1=613498r2=613499view=diff
==
--- lucene/hadoop/trunk/CHANGES.txt (original)
+++ lucene/hadoop/trunk/CHANGES.txt Sat Jan 19 18:39:10 2008
@@ -101,6 +101,11 @@
 sequence files as BytesWritable/BytesWritable regardless of the
 key and value types used to write the file. (cdouglas via omalley)
 
+HADOOP-2367. Add ability to profile a subset of map/reduce tasks and fetch
+the result to the local filesystem of the submitting application. Also
+includes a general IntegerRanges extension to Configuration for setting
+positive, ranged parameters. (Owen O'Malley via cdouglas)
+
   IMPROVEMENTS
 
 HADOOP-2045.  Change committer list on website to a table, so that

Modified: lucene/hadoop/trunk/src/java/org/apache/hadoop/conf/Configuration.java
URL: 
http://svn.apache.org/viewvc/lucene/hadoop/trunk/src/java/org/apache/hadoop/conf/Configuration.java?rev=613499r1=613498r2=613499view=diff
==
--- lucene/hadoop/trunk/src/java/org/apache/hadoop/conf/Configuration.java 
(original)
+++ lucene/hadoop/trunk/src/java/org/apache/hadoop/conf/Configuration.java Sat 
Jan 19 18:39:10 2008
@@ -32,10 +32,12 @@
 import java.util.HashMap;
 import java.util.HashSet;
 import java.util.Iterator;
+import java.util.List;
 import java.util.ListIterator;
 import java.util.Map;
 import java.util.Properties;
 import java.util.Set;
+import java.util.StringTokenizer;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
@@ -461,6 +463,103 @@
*/
   public void setBoolean(String name, boolean value) {
 set(name, Boolean.toString(value));
+  }
+
+  /**
+   * A class that represents a set of positive integer ranges. It parses 
+   * strings of the form: 2-3,5,7- where ranges are separated by comma and 
+   * the lower/upper bounds are separated by dash. Either the lower or upper 
+   * bound may be omitted meaning all values up to or over. So the string 
+   * above means 2, 3, 5, and 7, 8, 9, ...
+   */
+  public static class IntegerRanges {
+private static class Range {
+  int start;
+  int end;
+}
+
+ListRange ranges = new ArrayListRange();
+
+public IntegerRanges() {
+}
+
+public IntegerRanges(String newValue) {
+  StringTokenizer itr = new StringTokenizer(newValue, ,);
+  while (itr.hasMoreTokens()) {
+String rng = itr.nextToken().trim();
+String[] parts = rng.split(-, 3);
+if (parts.length  1 || parts.length  2) {
+  throw new IllegalArgumentException(integer range badly formed:  + 
+ rng);
+}
+Range r = new Range();
+r.start = convertToInt(parts[0], 0);
+if (parts.length == 2) {
+  r.end = convertToInt(parts[1], Integer.MAX_VALUE);
+} else {
+  r.end = r.start;
+}
+if (r.start  r.end) {
+  throw new IllegalArgumentException(IntegerRange from  + r.start + 
+  to  + r.end +  is invalid);
+}
+ranges.add(r);
+  }
+}
+
+/**
+ * Convert a string to an int treating empty strings as the default value.
+ * @param value the string value
+ * @param defaultValue the value for if the string is empty
+ * @return the desired integer
+ */
+private static int convertToInt(String value, int defaultValue) {
+  String trim = value.trim();
+  if (trim.length() == 0) {
+return defaultValue;
+  }
+  return Integer.parseInt(trim);
+}
+
+/**
+ * Is the given value in the set of ranges
+ * @param value the value to check
+ * @return is the value in the ranges?
+ */
+public boolean isIncluded(int value) {
+  for(Range r: ranges) {
+if