The ZIP file format in Hadoop is not "native", and does not include the
option to ignore some files. But, here is a good alternative:
http://cotdp.com/blog/2011/03/reading-zip-files-from-hadoop-mapreduce.html

This, with accept/reject file patterns, would do the trick.

(p.s. The FPGrowth Writable format is TopKStringPatterns.java.)

On Fri, Aug 26, 2011 at 3:02 AM, Sean Owen <[email protected]> wrote:

> Yeah but then our output is not interoperable with other stuff off the bat,
> over something pretty minor.
>
> On Fri, Aug 26, 2011 at 10:59 AM, Lance Norskog <[email protected]> wrote:
>
> > Yeah; it would have to be a variant of SequenceFile managed by some kind
> of
> > 'MahoutFileUtility' class. No, I haven't written it. "Tagged Sequence
> File"
> > with a suffix .tsq?
> >
> > A second use case crops up with the FPGrowth code. FPGrowth develops a
> > multi-level data structure of its results, then writes this out in a
> > SequenceFile. In the Mahout code base there is no code that reads this
> data
> > for any reason. It would be more useful to have different "flattenings"
> > written out in commonly used Writables.
> >
> >
>



-- 
Lance Norskog
[email protected]

Reply via email to