Coincidentally, you can find a HiveNullValueSequenceFileOutputFormat in my 
HIVE-1295.1.patch:

https://issues.apache.org/jira/browse/HIVE-1295

(I needed this because that's what TotalOrderPartitioner wanted...)

JVS

On Apr 13, 2010, at 5:15 PM, Edward Capriolo wrote:

> I was looking at the code and it looks like hive uses
> ignorekeyOUTPUTformat so rather the trying to swap values in the
> inputformat just write an ignore value output format.
> 
> On Tuesday, April 13, 2010, Edward Capriolo <[email protected]> wrote:
>> 
>> 
>> On Fri, Apr 2, 2010 at 9:34 PM, Zheng Shao <[email protected]> wrote:
>> 
>> The easiest way is to write a SequenceFileInputFormat that returns a
>> RecordReader that has key in the value and value in the key.
>> 
>> Zheng
>> 
>> On Fri, Apr 2, 2010 at 2:16 PM, Edward Capriolo <[email protected]> 
>> wrote:
>>> I have some sequence files in which all our data is in the key.
>>> 
>>> http://osdir.com/ml/hive-user-hadoop-apache/2009-10/msg00027.html
>>> 
>>> Has anyone tackled the above issue?
>>> 
>>> 
>> 
>> 
>> 
>> --
>> Yours,
>> Zheng
>> 
>> 
>> I am attempting to do this for sequence files. Unfortunately I have to copy 
>> much of the SequenceFile format since the reader (in) has private access.
>> ----------------------------------------
>> public class SequenceKeyOnlyInputFormat<K extends WritableComparable, V 
>> extends Writable> extends SequenceFileInputFormat<K, V> {
>> 
>>     public RecordReader<K, V> getRecordReader(InputSplit split, JobConf job, 
>> Reporter reporter) throws IOException {
>>         reporter.setStatus(split.toString());
>>         return new SequenceKeyOnlyRecordReader<K, V>(job, (FileSplit) split);
>>     }
>> 
>> }
>> --------------------------------------------
>> @SuppressWarnings({ "unchecked", "deprecation" })
>> public class SequenceKeyOnlyRecordReader<K extends WritableComparable , V 
>> extends Writable>
>> implements RecordReader<K, V>{
>> 
>>     private SequenceFile.Reader in;
>>     private long start;
>>     private long end;
>>     private boolean more = true;
>>     protected Configuration conf;
>> 
>> 
>>     public SequenceKeyOnlyRecordReader(Configuration conf, FileSplit split) 
>> throws IOException {
>>         Path path = split.getPath();
>>         FileSystem fs = path.getFileSystem(conf);
>>         this.in = new SequenceFile.Reader(fs, path, conf);
>>         this.end = split.getStart() + split.getLength();
>>         this.conf = conf;
>> 
>>         if (split.getStart() > in.getPosition()) in.sync(split.getStart()); 
>> // sync to start
>> 
>>         this.start = in.getPosition();
>>         more = start < end;
>>     }
>> 
>>     /**
>>      * The class of key that must be passed to {...@link #next(Object, 
>> Object)}..
>>      */
>>     public Class getKeyClass() {
>>         return in.getKeyClass();
>>     }
>> 
>>     /**
>>      * The class of value that must be passed to {...@link #next(Object, 
>> Object)}..
>>      */
>>     public Class getValueClass() {
>>         return in.getKeyClass();
>>     }
>> 
>>     public K createKey() {
>>         return (K) ReflectionUtils.newInstance(getKeyClass(), conf);
>>     }
>> 
>>     public V createValue() {
>>         return (V) ReflectionUtils.newInstance(getKeyClass(), conf);
>>     }
>> 
>>     public synchronized boolean next(K key, V value) throws IOException {
>>         if (!more) return false;
>>         long pos = in.getPosition();
>> 
>>         boolean remaining = in.next(key);
>>         if (remaining) {
>>             getCurrentValue(value);
>>         }
>>         if (pos >= end && in.syncSeen()) {
>>             more = false;
>>         } else {
>>             more = remaining;
>>         }
>>         return more;
>>     }
>> 
>>     protected synchronized boolean next(K key) throws IOException {
>>         if (!more) return false;
>>         long pos = in.getPosition();
>>         boolean remaining = in.next(key);
>>         if (pos >= end && in.syncSeen()) {
>>             more = false;
>>         } else {
>>             more = remaining;
>>         }
>>         return more;
>>     }
>> 
>>     protected synchronized void getCurrentValue(V value) throws IOException {
>>          in.getCurrentValue(value);
>>         //in.next(value);
>>     }
>> 
>>     /**
>>      * Return the progress within the input split
>>      *
>>      * @return 0.0 to 1.0 of the input byte range
>>      */
>>     public float getProgress() throws IOException {
>>         if (end == start) {
>>             return 0.0f;
>>         } else {
>>             return Math.min(1.0f, (in.getPosition() - start) / (float) (end 
>> - start));
>>         }
>>     }
>> 
>>     public synchronized long getPos() throws IOException {
>>         return in.getPosition();
>>     }
>> 
>>     protected synchronized void seek(long pos) throws IOException {
>>         in.seek(pos);
>>     }
>> 
>>     public synchronized void close() throws IOException {
>>         in.close();
>>     }
>> 
>> }
>> 
>> seems like:
>> 
>>     protected synchronized void getCurrentValue(V value) throws IOException {
>>          in.getCurrentValue(value);
>>     }
>> 
>> ^ Returns nulls
>> 
>>     protected synchronized void getCurrentValue(V value) throws IOException {
>>        in.next(value);
>>     }
>> 
>> ^ returns every other row.
>> 
>> Do you have any idea what I am doing wrong? Will contrib it hopefully If i 
>> can get this going correctly.
>> 
>> Thanks,
>> Edward
>> 

Reply via email to