I've been developing a HiveStorageHandler class (and associated classes) to integrate a non-file-based table storage engine into Hive. I am currently working with version 1.3 of the HortonWorks distro, but the issue that I've run into appears to be present in the Apache.Org code base as well.
The specific issue that occurs is that when the MapReduce program is run, it dies with the following exception: java.lang.IllegalArgumentException: Can not create a Path from an empty string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:82) at org.apache.hadoop.fs.Path.<init>(Path.java:90) at org.apache.hadoop.hive.ql.io.HiveInputFormat$HiveInputSplit.getPath(HiveInputFormat.java:106) at org.apache.hadoop.mapred.MapTask.updateJobWithSplit(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:408) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:365) at org.apache.hadoop.mapred.Child$4.run(Child.java:255) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) at org.apache.hadoop.mapred.Child.main(Child.java:249) Looking at the code for HiveInputFormat.getPath() I find the following: public Path getPath() { if (inputSplit instanceof FileSplit) { return ((FileSplit) inputSplit).getPath(); } return new Path(""); } It would appear that this code means that if my InputFormat.getSplits() method returns InputSplit objects that do not derive from FileSplit (which is the case for my InputFormat class as my storage engine is not file-based), the 'getPath()' method will try to return 'new Path("")'. The problem is that the code for the Path class specifically disallows constructing an instance of Path with an empty string. Here is the code for Path.checkPathArg(): private void checkPathArg( String path ) { // disallow construction of a Path from an empty string if ( path == null) { throw new IllegalArgumentException( "Can not create a Path from a null string"); } if ( path.length() == 0 ) { throw new IllegalArgumentException( "Can not create a Path from an empty string"); } } So if HiveInputFormat.getPath() is ever called when 'inputSplit' is not an instance of 'FileSplit' it invokes the construction of a Path object that will fail with an exception. So my question is: If this is a bug in Hive, can we get it fixed? If it is not a bug in Hive but rather a misunderstanding on my part, could someone give me some pointers on how to use InputSplit objects that do not derive from FileSplit in such a way as to avoid tripping this issue? Thank you for your time. Eric Karlson