I think Spark in itself doesn't allow DFOC when append mode is enabled. So DFOC works only for Insert overwrite queries/overwrite mode not for append mode.
Regards Venkata krishnan On Fri, Jun 16, 2017 at 9:35 PM, sririshindra <sririshin...@gmail.com> wrote: > Hi Ryan and Steve, > > Thanks very much for your reply. > > I was finally able to get Ryan's repo work for me by changing the output > committer to FileOutputFormat instead of ParquetOutputCommitter in spark as > Steve suggested. > > However, It is not working for append mode while saving the data frame. > > val hf = > spark.read.parquet("/home/user/softwares/spark-2.1.0- > bin-hadoop2.7/examples/src/main/resources/users.parquet") > > hf.persist(StorageLevel.DISK_ONLY) > hf.show() > hf.write > .partitionBy("name").mode("append") > .save(S3Location + "data" + ".parquet") > > > > The above code is successfully saving the parquet file when I am running it > for the first time. But When I rerun the code again the new parquet files > are not getting added to s3 > > I have put a print statement in the constructors of > PartitionedOutputCommiter in Ryan's repo and realized that the partitioned > output committer is not even getting called the second time I ran the code. > It is being called only for the first time. Is there anything that I can do > to make spark call the PartitionedOutputCommiter even when the file already > exists in s3? > > > > > > > -- > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Output-Committers- > for-S3-tp21033p21776.html > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >