On 25 August 2011 00:09, Ted Dunning <[email protected]> wrote:
> Praneet and I were just talking about a project he is working on to do with
> higher-order learning methods such as boosting and feature sharding.  This
> is all pretty much in the context of classification and possibly clustering.
>
> The problems are:
>
> a) mahout doesn't have a general input format for classifiable data (this
> has been discussed recently)
>
> b) hashed vector representations are not suitable for feature sharding since
> individual features may be redundantly represented in many locations.
>
> c) mahout doesn't have a reasonable data structure for general data transfer
> (related to -a-)

Re (c),
Could Apache Pig's store/load subsystem be useful here? With possible
side-benefit of making data on the same Hadoop cluster amenable to
both Mahout and Pig-based hackery / analysis / scripting? Code is also
already in the Apache universe, which reduces friction around
licensing, Maven etc.

http://pig.apache.org/docs/r0.9.0/func.html#load-store-functions
 http://pig.apache.org/docs/r0.9.0/func.html#pigdump
 http://pig.apache.org/docs/r0.9.0/func.html#pigstorage

cheers,

Dan

Reply via email to