[
https://issues.apache.org/jira/browse/MAHOUT-459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Dunning updated MAHOUT-459:
-------------------------------
Fix Version/s: 0.5
(was: 0.4)
Totally agree. Another thing that makes me interested in Lucene 4.0 is that
Grant mentioned that many of the tokenizers will be byte oriented by then.
That is really interesting because in my tests, using a byte oriented state
machine for parsing csv data can be nearly an order of magnitude faster than
using strings. This result is a combination of avoiding string conversions,
avoiding computing string hashes, avoiding allocations and generally moving
less data. Also, I can do more by reference on a single line and can build
special purpose bespoke numerical converters. These changes all have
synergistic effects which makes them work even better.
OVerall, I will be very interested in seeing what Lucene 4.0 brings. But that
will be post Mahout 0.4, it seems.
> Reading an Index from Lucene/Solr 4.0-dev
> -----------------------------------------
>
> Key: MAHOUT-459
> URL: https://issues.apache.org/jira/browse/MAHOUT-459
> Project: Mahout
> Issue Type: Improvement
> Components: Utils
> Affects Versions: 0.4
> Environment: Windows Server 2008 R2 Standard, Cygwin, Solr-trunk,
> Mahout-trunk
> Reporter: Stephen McGill
> Priority: Minor
> Fix For: 0.5
>
> Attachments: Mahout-Importing-Vectors-Lucene-Solr-4-dev.diff
>
>
> It is not possible to read indexes created by Lucene/Solr 4.0-dev (the trunk
> development) with the Lucene libraries that are included with Mahout-dev.
> When adding the new Lucene/Solr 4.0-dev, there are API changes that do not
> allow Mahout to compile.
> By adapting mahout-utils to fit Lucene/Solr 4.0-dev's API changes, it is
> possible to read its index.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.