The input file is in .gz format FYI On Fri, Jan 8, 2010 at 11:08 AM, Ted Yu <yuzhih...@gmail.com> wrote:
> My current project processes input file of size 333302161 bytes. > What I plan to do is to split the file into equal size pieces (and on blank > line boundary) to improve performance. > > I found 12 classes in 0.20.1 source code which implement InputSplit. > > If someone has written code similar to what I plan to do, please share some > hint. > > Thanks > > > On Fri, Jan 8, 2010 at 2:27 AM, Amogh Vasekar <am...@yahoo-inc.com> wrote: > >> Hi, >> The deprecation is due to the new evolving mapreduce ( o.a.h.mapreduce ) >> APIs. Old APIs are supported for available distributions. The equivalent of >> TextInputFormat is available in new API : >> >> >> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html >> >> Thanks, >> Amogh >> >> >> On 1/8/10 3:47 AM, "Ted Yu" <yuzhih...@gmail.com> wrote: >> >> According to: >> >> http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/TextInputFormat.html#isSplitable%28org.apache.hadoop.fs.FileSystem,%20org.apache.hadoop.fs.Path%29 >> >> isSplitable() is deprecated. >> >> Which method should I use to replace it ? >> >> Thanks >> >> >