Re: Creating Vectors from Text

sushil_kb Tue, 27 Oct 2009 17:40:31 -0700

I am having the same problem as Allan. I checked out mahout from trunk and
tried to create term frequency vector from a lucene index and ran into
this..


09/10/27 17:36:10 INFO lucene.Driver: Output File:
/Users/shoeseal/DATA/luc2tvec.out
09/10/27 17:36:11 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
09/10/27 17:36:11 INFO compress.CodecPool: Got brand-new compressor
Exception in thread "main" java.lang.NullPointerException
        at
org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:109)
        at
org.apache.mahout.utils.vectors.lucene.LuceneIterable$TDIterator.next(LuceneIterable.java:1)
        at
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:40)
        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:200)

I am running this from Eclipse (snow leopard with JDK 6), on an index that
has field with stored term vectors..

my input parameters for Driver are: 
--dir <path>/smallidx/ --output <path>/luc2tvec.out --idField id_field
 --field field_with_TV --dictOut <path>/luc2tvec.dict --max 50  --weight tf

Luke shows the following info on the fields I am using:
 id_field is indexed, stored, omit norms
 field_with_TV is indexed, tokenized, stored, term vector

I can run the test LuceneIterableTest fine but when I run the Driver on my
index I get into trouble. Any possible reasons for this behavior besides not
having an index field with stored term vector?

Thanks.
- sushil




Grant Ingersoll-6 wrote:
> 
> 
> On Jul 2, 2009, at 12:09 PM, Allan Roberto Avendano Sudario wrote:
> 
>> Regards,
>> This is the entire exception message:
>>
>>
>> java -cp $JAVACLASSPATH org.apache.mahout.utils.vectors.Driver --dir
>> /home/hadoop/Desktop/<urls>/index  --field content  --dictOut
>> /home/hadoop/Desktop/dictionary/dict.txt --output
>> /home/hadoop/Desktop/dictionary/out.txt --max 50 --norm 2
>>
>>
>> 09/07/02 09:35:47 INFO vectors.Driver: Output File:
>> /home/hadoop/Desktop/dictionary/out.txt
>> 09/07/02 09:35:47 INFO util.NativeCodeLoader: Loaded the native-hadoop
>> library
>> 09/07/02 09:35:47 INFO zlib.ZlibFactory: Successfully loaded &  
>> initialized
>> native-zlib library
>> 09/07/02 09:35:47 INFO compress.CodecPool: Got brand-new compressor
>> Exception in thread "main" java.lang.NullPointerException
>>        at
>> org.apache.mahout.utils.vectors.lucene.LuceneIteratable 
>> $TDIterator.next(LuceneIteratable.java:111)
>>        at
>> org.apache.mahout.utils.vectors.lucene.LuceneIteratable 
>> $TDIterator.next(LuceneIteratable.java:82)
>>        at
>> org 
>> .apache 
>> .mahout 
>> .utils 
>> .vectors 
>> .io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:25)
>>        at org.apache.mahout.utils.vectors.Driver.main(Driver.java:204)
>>
>>
>> Well, I used a nutch crawl index, is that correct? mmm... I have  
>> change to
>> contenc field, but nothing happened.
>> Possibly the nutch crawl doesn´t have Term Vector indexed.
> 
> This would be my guess.  A small edit to Nutch code would probably  
> allow it.  Just find where it creates a new Field and add in the TV  
> stuff.
> 

-- 
View this message in context: 
http://www.nabble.com/Creating-Vectors-from-Text-tp24298643p26087537.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Creating Vectors from Text

Reply via email to