Thanks for the link. I’m still running 1.3.1 but will give it a try :) Hao
> On Jun 13, 2015, at 9:38 AM, Will Briggs <wrbri...@gmail.com> wrote: > > Check out this recent post by Cheng Liam regarding dynamic partitioning in > Spark 1.4: https://www.mail-archive.com/user@spark.apache.org/msg30204.html > <https://www.mail-archive.com/user@spark.apache.org/msg30204.html> > > On June 13, 2015, at 5:41 AM, Hao Wang <bill...@gmail.com> wrote: > > > Hi, > > I have a bunch of large log files on Hadoop. Each line contains a log and its > severity. Is there a way that I can use Spark to split the entire data set > into different files on Hadoop according the severity field? Thanks. Below is > an example of the input and output. > > Input: > [ERROR] log1 > [INFO] log2 > [ERROR] log3 > [INFO] log4 > > Output: > error_file > [ERROR] log1 > [ERROR] log3 > > info_file > [INFO] log2 > [INFO] log4 > > > Best, > Hao Wang