What Nick said was correct.
What I should also state is that I am using python spark variant in this
case not the scala.
I am looking to use the guid prefix of part-0 to prevent a race condition by
using a s3 waiter for the part to appear, but to achieve this, I need to
know the guid value in adv
I should add that I tried using a waiter on the _SUCCESS file but it did not
prove successful as due to its small size compared to the part-0 file it
seems to be appearing before the part-0 file in s3, even though it was
written afterwards.
--
Sent from: http://apache-spark-developers-list.10015
Thanks Etienne! Yeah I forgot to say nice talking with you again. And sorry
I forgot to send the reply (was in draft).
Regarding investment in SS, well, unfortunately I don't know - I'm just an
individual. There might be various reasons to do so, most probably
"priority" among the stuff. There's n
I think what George is looking for is a way to determine ahead of time the
partition IDs that Spark will use when writing output.
George,
I believe this is an example of what you're looking for:
https://github.com/databricks/spark-redshift/blob/184b4428c1505dff7b4365963dc344197a92baa9/src/main/sc
If I understand your problem correctly, the prefix you provided is actually
"-" + UUID. You can get it by uuid generator like
https://docs.python.org/3/library/uuid.html#uuid.uuid4.
--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/