[jira] Created: (MAHOUT-546) Bug creating vector from Solr index with TrieFields

G Fernandes (JIRA) Tue, 16 Nov 2010 07:10:40 -0800

Bug creating vector from Solr index with TrieFields
---------------------------------------------------


                 Key: MAHOUT-546
                 URL: https://issues.apache.org/jira/browse/MAHOUT-546
             Project: Mahout
          Issue Type: Bug
          Components: Utils
    Affects Versions: 0.4
            Reporter: G Fernandes


A Solr index with an Id field of type TrieLong causes NPE when trying to 
extract vectors from the lucene index

Command line used:
../bin/mahout lucene.vector -d ~/solr/data/index/ -o  vectors.vec -t d.dic -f 
text -i articleId

where 'articleId' is a Stored numeric field declared as 'tlong' in Solr.  
Output:

no HADOOP_HOME set, running locally
Nov 16, 2010 2:59:50 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Output File: moreover.vec
Nov 16, 2010 2:59:51 PM org.apache.hadoop.util.NativeCodeLoader <clinit>
WARNING: Unable to load native-hadoop library for your platform... using 
builtin-java classes where applicable
Nov 16, 2010 2:59:51 PM org.apache.hadoop.io.compress.CodecPool getCompressor
INFO: Got brand-new compressor
Exception in thread "main" java.lang.NullPointerException
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:330)
        at java.io.DataOutputStream.writeUTF(DataOutputStream.java:306)
        at org.apache.mahout.math.VectorWritable.write(VectorWritable.java:125)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
        at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
        at 
org.apache.hadoop.io.SequenceFile$RecordCompressWriter.append(SequenceFile.java:1128)
        at 
org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
        at 
org.apache.mahout.utils.vectors.io.SequenceFileVectorWriter.write(SequenceFileVectorWriter.java:46)
        at org.apache.mahout.utils.vectors.lucene.Driver.main(Driver.java:226)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
        at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:184)

The problem is at class LuceneIteratable line 130:
  name = indexReader.document(doc, idFieldSelector).get(idField);

It does not work with the way Solr stores numeric fields in the index.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (MAHOUT-546) Bug creating vector from Solr index with TrieFields

Reply via email to