Re: Output Committers for S3

Venkatakrishnan Sowrirajan Sat, 17 Jun 2017 00:09:31 -0700

I think Spark in itself doesn't allow DFOC when append mode is enabled. So
DFOC works only for Insert overwrite queries/overwrite mode not for append
mode.


Regards
Venkata krishnan

On Fri, Jun 16, 2017 at 9:35 PM, sririshindra <sririshin...@gmail.com>
wrote:

> Hi Ryan and Steve,
>
> Thanks very much for your reply.
>
> I was finally able to get Ryan's repo work for me by changing the output
> committer to FileOutputFormat instead of ParquetOutputCommitter in spark as
> Steve suggested.
>
> However, It is not working for append mode while saving the data frame.
>
>     val hf =
> spark.read.parquet("/home/user/softwares/spark-2.1.0-
> bin-hadoop2.7/examples/src/main/resources/users.parquet")
>
>     hf.persist(StorageLevel.DISK_ONLY)
>     hf.show()
>     hf.write
>       .partitionBy("name").mode("append")
>       .save(S3Location + "data" + ".parquet")
>
>
>
> The above code is successfully saving the parquet file when I am running it
> for the first time. But When I rerun the code again the new parquet files
> are not getting added to s3
>
> I have put a print statement in the constructors of
> PartitionedOutputCommiter in Ryan's repo and realized that the partitioned
> output committer is not even getting called the second time I ran the code.
> It is being called only for the first time. Is there anything that I can do
> to make spark call the PartitionedOutputCommiter even when the file already
> exists in s3?
>
>
>
>
>
>
> --
> View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Output-Committers-
> for-S3-tp21033p21776.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: Output Committers for S3

Reply via email to