Re: how to write dataset in a file?

Márton Balassi Sat, 21 Nov 2015 05:24:08 -0800

Additionally as having multiple files under /output1.txt is standard in the
Hadoop ecosystem you can transparently read all the files with
env.readTextFile("/output1.txt").


You can also set parallelism on individual operators (e.g the file writer)
if you really need a single output.

On Fri, Nov 20, 2015, 21:27 Suneel Marthi <smar...@apache.org> wrote:

> You can write to a single output file by setting parallelism == 1
>
>  So final ExecutionEnvironment env = ExecutionEnvironment.
> createLocalEnvironment().setParallelism(1);
>
> The reason u see multiple output files is because, each worker is writing
> to a different file.
>
> On Fri, Nov 20, 2015 at 10:06 PM, jun aoki <ja...@apache.org> wrote:
>
> > Hi Flink community
> >
> > I know I'm mistaken but could not find what I want.
> >
> > final ExecutionEnvironment env =
> > ExecutionEnvironment.createLocalEnvironment();
> > DataSet<String> data = env.readTextFile("file:///text1.txt");
> > FilterFunction<String> filter = new MyFilterFunction();  // looks for a
> > line starts with "[ERROR]"
> > DataSet<String> filteredData = data.filter(filter);
> > filteredData.writeAsText("file:///output1.txt");
> > env.execute();
> >
> > Then I expect to get a single file /output1.txt , but actually get
> > /output1.txt/1, /output1.txt/2, /output1.txt/3...
> > I assumed I was getting a single file because the method signature says
> > writeAsText(String filePath).  <-- filePath instead of directoryPath
> > Also the Javadoc comment sounds like I assumed right.
> >
> >
> https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java#L1354
> >
> > Can anyone tell if the method signature and document should be fixed? or
> if
> > I am missing some configuration?
> >
> > --
> > -jun
> >
>

Re: how to write dataset in a file?

Reply via email to