[jira] Updated: (MAHOUT-459) Reading an Index from Lucene/Solr 4.0-dev

Ted Dunning (JIRA) Tue, 21 Sep 2010 12:47:21 -0700

     [ 
https://issues.apache.org/jira/browse/MAHOUT-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Ted Dunning updated MAHOUT-459:
-------------------------------

    Fix Version/s: 0.5
                       (was: 0.4)

Totally agree.  Another thing that makes me interested in Lucene 4.0 is that 
Grant mentioned that many of the tokenizers will be byte oriented by then.  
That is really interesting because in my tests, using a byte oriented state 
machine for parsing csv data can be nearly an order of magnitude faster than 
using strings.  This result is a combination of avoiding string conversions, 
avoiding computing string hashes, avoiding allocations and generally moving 
less data.  Also, I can do more by reference on a single line and can build 
special purpose bespoke numerical converters.  These changes all have 
synergistic effects which makes them work even better.

OVerall, I will be very interested in seeing what Lucene 4.0 brings.  But that 
will be post Mahout 0.4, it seems.

> Reading an Index from Lucene/Solr 4.0-dev
> -----------------------------------------
>
>                 Key: MAHOUT-459
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-459
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Utils
>    Affects Versions: 0.4
>         Environment: Windows Server 2008 R2 Standard, Cygwin, Solr-trunk, 
> Mahout-trunk
>            Reporter: Stephen McGill
>            Priority: Minor
>             Fix For: 0.5
>
>         Attachments: Mahout-Importing-Vectors-Lucene-Solr-4-dev.diff
>
>
> It is not possible to read indexes created by Lucene/Solr 4.0-dev (the trunk 
> development) with the Lucene libraries that are included with Mahout-dev.  
> When adding the new Lucene/Solr 4.0-dev, there are API changes that do not 
> allow Mahout to compile.
> By adapting mahout-utils to fit Lucene/Solr 4.0-dev's API changes, it is 
> possible to read its index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (MAHOUT-459) Reading an Index from Lucene/Solr 4.0-dev

Reply via email to