He is using CSV and either ORC or parquet would be fine.
> On 28. Jan 2018, at 06:49, Gourav Sengupta wrote:
>
> Hi,
>
> There is definitely a parameter while creating temporary security credential
> to mention the number of minutes those credentials will be active.
Hi,
There is definitely a parameter while creating temporary security
credential to mention the number of minutes those credentials will be
active. There is an upper limit ofcourse which is around 3 days in case I
remember correctly and the default, as you can see, is 30 mins.
Can you let me
Are you writing from an Amazon instance or from a on premise install to S3?
How many partitions are you writing from? Maybe you can try to “play” with
repartitioning to see how it behaves?
> On Jan 23, 2018, at 17:09, Vasyl Harasymiv wrote:
>
> It is about 400
It is about 400 million rows. S3 automatically chunks the file on their end
while writing, so that's fine, e.g. creates the same file name with
alphanumeric suffixes.
However, the write session expires due to token expiration.
On Tue, Jan 23, 2018 at 5:03 PM, Jörn Franke
How large is the file?
If it is very large then you should have anyway several partitions for the
output. This is also important in case you need to read again from S3 - having
several files there enables parallel reading.
> On 23. Jan 2018, at 23:58, Vasyl Harasymiv
Hi Spark Community,
Saving a data frame into a file on S3 using:
*df.write.csv(s3_location)*
If run for longer than 30 mins, the following error persists:
*The provided token has expired. (Service: Amazon S3; Status Code: 400;
Error Code: ExpiredToken;`)*
Potentially, because there is a