Re: Naming files while saving a Dataframe

2021-08-12 Thread Eric Beabes
This doesn't work as given here (
https://stackoverflow.com/questions/36107581/change-output-filename-prefix-for-dataframe-write)
but the answer suggests using FileOutputFormat class. Will try that.
Thanks. Regards.

On Sun, Jul 18, 2021 at 12:44 AM Jörn Franke  wrote:

> Spark heavily depends on Hadoop writing files. You can try to set the
> Hadoop property: mapreduce.output.basename
>
>
> https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--
>
>
> Am 18.07.2021 um 01:15 schrieb Eric Beabes :
>
> 
> Mich - You're suggesting changing the "Path". Problem is that, we've an
> EXTERNAL table created on top of this path so "Path" CANNOT change. If we
> could, it would be easy to solve this problem. My question is about
> changing the "Filename".
>
> As Ayan pointed out, Spark doesn't seem to allow "prefixes" for the
> filenames!
>
> On Sat, Jul 17, 2021 at 1:58 PM Mich Talebzadeh 
> wrote:
>
>> Using this
>>
>> df.write.mode("overwrite").format("parquet").saveAsTable("test.ABCD")
>>
>> That will create a parquet table in the database test. which is
>> essentially a hive partition in the format
>>
>> /user/hive/warehouse/test.db/abcd/00_0
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 17 Jul 2021 at 20:45, Eric Beabes 
>> wrote:
>>
>>> I am not sure if you've understood the question. Here's how we're saving
>>> the DataFrame:
>>>
>>> df
>>>   .coalesce(numFiles)
>>>   .write
>>>   .partitionBy(partitionDate)
>>>   .mode("overwrite")
>>>   .format("parquet")
>>>
>>>   .save(*someDirectory*)
>>>
>>>
>>> Now where would I add a 'prefix' in this one?
>>>
>>>
>>> On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 try it see if it works

 fullyQualifiedTableName = appName+'_'+tableName



view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Sat, 17 Jul 2021 at 18:02, Eric Beabes 
 wrote:

> I don't think Spark allows adding a 'prefix' to the file name, does
> it? If it does, please tell me how. Thanks.
>
> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> Jobs have names in spark. You can prefix it to the file name when
>> writing to directory I guess
>>
>>  val sparkConf = new SparkConf().
>>setAppName(sparkAppName).
>>
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>> for any loss, damage or destruction of data or any other property which 
>> may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 17 Jul 2021 at 17:40, Eric Beabes 
>> wrote:
>>
>>> Reason we've two jobs writing to the same directory is that the data
>>> is partitioned by 'day' (mmdd) but the job runs hourly. Maybe the 
>>> only
>>> way to do this is to create an hourly partition (/mmdd/hh). Is that 
>>> the
>>> only way to solve this?
>>>
>>> On Fri, Jul 16, 2021 at 5:45 PM ayan guha 
>>> wrote:
>>>
 IMHO - this is a bad idea esp in failure scenarios.

 How about creating a subfolder each for the jobs?

 On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes <
 mailinglist...@gmail.com> wrote:

> We've two (or more) jobs that write data into the same directory
> via a Dataframe.save method. We need to be able to figure out which 
> job
> wrote which file. Maybe provide a 'prefix' to the file names. I was
> wondering if there's any 'option' that allows us to do this. Googling
> didn't come up with any solution so thought of asking the Spark 
> experts on
> this mailing list.
>
> Thanks in advance.
>
 --

Re: Naming files while saving a Dataframe

2021-07-18 Thread Jörn Franke
Spark heavily depends on Hadoop writing files. You can try to set the Hadoop 
property: mapreduce.output.basename 

https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration--


> Am 18.07.2021 um 01:15 schrieb Eric Beabes :
> 
> 
> Mich - You're suggesting changing the "Path". Problem is that, we've an 
> EXTERNAL table created on top of this path so "Path" CANNOT change. If we 
> could, it would be easy to solve this problem. My question is about changing 
> the "Filename".
> 
> As Ayan pointed out, Spark doesn't seem to allow "prefixes" for the filenames!
> 
>> On Sat, Jul 17, 2021 at 1:58 PM Mich Talebzadeh  
>> wrote:
>> Using this
>> 
>> df.write.mode("overwrite").format("parquet").saveAsTable("test.ABCD")
>> 
>> That will create a parquet table in the database test. which is essentially 
>> a hive partition in the format
>> 
>> /user/hive/warehouse/test.db/abcd/00_0
>> 
>> 
>>view my Linkedin profile
>> 
>>  
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may arise 
>> from relying on this email's technical content is explicitly disclaimed. The 
>> author will in no case be liable for any monetary damages arising from such 
>> loss, damage or destruction.
>>  
>> 
>> 
>>> On Sat, 17 Jul 2021 at 20:45, Eric Beabes  wrote:
>>> I am not sure if you've understood the question. Here's how we're saving 
>>> the DataFrame:
>>> 
>>> df
>>>   .coalesce(numFiles)
>>>   .write
>>>   .partitionBy(partitionDate)
>>>   .mode("overwrite")
>>>   .format("parquet")
>>>   .save(someDirectory)
>>> 
>>> Now where would I add a 'prefix' in this one?
>>> 
 On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh 
  wrote:
 try it see if it works
 
 fullyQualifiedTableName = appName+'_'+tableName
 
 
 
view my Linkedin profile
 
  
 Disclaimer: Use it at your own risk. Any and all responsibility for any 
 loss, damage or destruction of data or any other property which may arise 
 from relying on this email's technical content is explicitly disclaimed. 
 The author will in no case be liable for any monetary damages arising from 
 such loss, damage or destruction.
  
 
 
> On Sat, 17 Jul 2021 at 18:02, Eric Beabes  
> wrote:
> I don't think Spark allows adding a 'prefix' to the file name, does it? 
> If it does, please tell me how. Thanks.
> 
>> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh 
>>  wrote:
>> Jobs have names in spark. You can prefix it to the file name when 
>> writing to directory I guess
>> 
>>  val sparkConf = new SparkConf().
>>setAppName(sparkAppName).
>>  
>> 
>> 
>>view my Linkedin profile
>> 
>>  
>> Disclaimer: Use it at your own risk. Any and all responsibility for any 
>> loss, damage or destruction of data or any other property which may 
>> arise from relying on this email's technical content is explicitly 
>> disclaimed. The author will in no case be liable for any monetary 
>> damages arising from such loss, damage or destruction.
>>  
>> 
>> 
>>> On Sat, 17 Jul 2021 at 17:40, Eric Beabes  
>>> wrote:
>>> Reason we've two jobs writing to the same directory is that the data is 
>>> partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only 
>>> way to do this is to create an hourly partition (/mmdd/hh). Is that 
>>> the only way to solve this?
>>> 
 On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:
 IMHO - this is a bad idea esp in failure scenarios. 
 
 How about creating a subfolder each for the jobs? 
 
> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
>  wrote:
> We've two (or more) jobs that write data into the same directory via 
> a Dataframe.save method. We need to be able to figure out which job 
> wrote which file. Maybe provide a 'prefix' to the file names. I was 
> wondering if there's any 'option' that allows us to do this. Googling 
> didn't come up with any solution so thought of asking the Spark 
> experts on this mailing list.
> 
> Thanks in advance.
 -- 
 Best Regards,
 Ayan Guha


Re: Naming files while saving a Dataframe

2021-07-17 Thread Eric Beabes
Mich - You're suggesting changing the "Path". Problem is that, we've an
EXTERNAL table created on top of this path so "Path" CANNOT change. If we
could, it would be easy to solve this problem. My question is about
changing the "Filename".

As Ayan pointed out, Spark doesn't seem to allow "prefixes" for the
filenames!

On Sat, Jul 17, 2021 at 1:58 PM Mich Talebzadeh 
wrote:

> Using this
>
> df.write.mode("overwrite").format("parquet").saveAsTable("test.ABCD")
>
> That will create a parquet table in the database test. which is
> essentially a hive partition in the format
>
> /user/hive/warehouse/test.db/abcd/00_0
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 17 Jul 2021 at 20:45, Eric Beabes 
> wrote:
>
>> I am not sure if you've understood the question. Here's how we're saving
>> the DataFrame:
>>
>> df
>>   .coalesce(numFiles)
>>   .write
>>   .partitionBy(partitionDate)
>>   .mode("overwrite")
>>   .format("parquet")
>>
>>   .save(*someDirectory*)
>>
>>
>> Now where would I add a 'prefix' in this one?
>>
>>
>> On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> try it see if it works
>>>
>>> fullyQualifiedTableName = appName+'_'+tableName
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sat, 17 Jul 2021 at 18:02, Eric Beabes 
>>> wrote:
>>>
 I don't think Spark allows adding a 'prefix' to the file name, does it?
 If it does, please tell me how. Thanks.

 On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Jobs have names in spark. You can prefix it to the file name when
> writing to directory I guess
>
>  val sparkConf = new SparkConf().
>setAppName(sparkAppName).
>
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for
> any loss, damage or destruction of data or any other property which may
> arise from relying on this email's technical content is explicitly
> disclaimed. The author will in no case be liable for any monetary damages
> arising from such loss, damage or destruction.
>
>
>
>
> On Sat, 17 Jul 2021 at 17:40, Eric Beabes 
> wrote:
>
>> Reason we've two jobs writing to the same directory is that the data
>> is partitioned by 'day' (mmdd) but the job runs hourly. Maybe the 
>> only
>> way to do this is to create an hourly partition (/mmdd/hh). Is that 
>> the
>> only way to solve this?
>>
>> On Fri, Jul 16, 2021 at 5:45 PM ayan guha 
>> wrote:
>>
>>> IMHO - this is a bad idea esp in failure scenarios.
>>>
>>> How about creating a subfolder each for the jobs?
>>>
>>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes <
>>> mailinglist...@gmail.com> wrote:
>>>
 We've two (or more) jobs that write data into the same directory
 via a Dataframe.save method. We need to be able to figure out which job
 wrote which file. Maybe provide a 'prefix' to the file names. I was
 wondering if there's any 'option' that allows us to do this. Googling
 didn't come up with any solution so thought of asking the Spark 
 experts on
 this mailing list.

 Thanks in advance.

>>> --
>>> Best Regards,
>>> Ayan Guha
>>>
>>


Re: Naming files while saving a Dataframe

2021-07-17 Thread Mich Talebzadeh
Using this

df.write.mode("overwrite").format("parquet").saveAsTable("test.ABCD")

That will create a parquet table in the database test. which is essentially
a hive partition in the format

/user/hive/warehouse/test.db/abcd/00_0


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 17 Jul 2021 at 20:45, Eric Beabes  wrote:

> I am not sure if you've understood the question. Here's how we're saving
> the DataFrame:
>
> df
>   .coalesce(numFiles)
>   .write
>   .partitionBy(partitionDate)
>   .mode("overwrite")
>   .format("parquet")
>
>   .save(*someDirectory*)
>
>
> Now where would I add a 'prefix' in this one?
>
>
> On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh <
> mich.talebza...@gmail.com> wrote:
>
>> try it see if it works
>>
>> fullyQualifiedTableName = appName+'_'+tableName
>>
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sat, 17 Jul 2021 at 18:02, Eric Beabes 
>> wrote:
>>
>>> I don't think Spark allows adding a 'prefix' to the file name, does it?
>>> If it does, please tell me how. Thanks.
>>>
>>> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Jobs have names in spark. You can prefix it to the file name when
 writing to directory I guess

  val sparkConf = new SparkConf().
setAppName(sparkAppName).




view my Linkedin profile
 



 *Disclaimer:* Use it at your own risk. Any and all responsibility for
 any loss, damage or destruction of data or any other property which may
 arise from relying on this email's technical content is explicitly
 disclaimed. The author will in no case be liable for any monetary damages
 arising from such loss, damage or destruction.




 On Sat, 17 Jul 2021 at 17:40, Eric Beabes 
 wrote:

> Reason we've two jobs writing to the same directory is that the data
> is partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only
> way to do this is to create an hourly partition (/mmdd/hh). Is that 
> the
> only way to solve this?
>
> On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:
>
>> IMHO - this is a bad idea esp in failure scenarios.
>>
>> How about creating a subfolder each for the jobs?
>>
>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
>> wrote:
>>
>>> We've two (or more) jobs that write data into the same directory via
>>> a Dataframe.save method. We need to be able to figure out which job 
>>> wrote
>>> which file. Maybe provide a 'prefix' to the file names. I was wondering 
>>> if
>>> there's any 'option' that allows us to do this. Googling didn't come up
>>> with any solution so thought of asking the Spark experts on this mailing
>>> list.
>>>
>>> Thanks in advance.
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>


Re: Naming files while saving a Dataframe

2021-07-17 Thread ayan guha
Hi Eric - yes that maybe the best way to resolve this. I have not seen any
specific way to define names of the actual files written by spark. Finally,
make sure you optimize number of files written.

On Sun, Jul 18, 2021 at 2:39 AM Eric Beabes 
wrote:

> Reason we've two jobs writing to the same directory is that the data is
> partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only way
> to do this is to create an hourly partition (/mmdd/hh). Is that the
> only way to solve this?
>
> On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:
>
>> IMHO - this is a bad idea esp in failure scenarios.
>>
>> How about creating a subfolder each for the jobs?
>>
>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
>> wrote:
>>
>>> We've two (or more) jobs that write data into the same directory via a
>>> Dataframe.save method. We need to be able to figure out which job wrote
>>> which file. Maybe provide a 'prefix' to the file names. I was wondering if
>>> there's any 'option' that allows us to do this. Googling didn't come up
>>> with any solution so thought of asking the Spark experts on this mailing
>>> list.
>>>
>>> Thanks in advance.
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>

-- 
Best Regards,
Ayan Guha


Re: Naming files while saving a Dataframe

2021-07-17 Thread Eric Beabes
I am not sure if you've understood the question. Here's how we're saving
the DataFrame:

df
  .coalesce(numFiles)
  .write
  .partitionBy(partitionDate)
  .mode("overwrite")
  .format("parquet")

  .save(*someDirectory*)


Now where would I add a 'prefix' in this one?


On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh 
wrote:

> try it see if it works
>
> fullyQualifiedTableName = appName+'_'+tableName
>
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Sat, 17 Jul 2021 at 18:02, Eric Beabes 
> wrote:
>
>> I don't think Spark allows adding a 'prefix' to the file name, does it?
>> If it does, please tell me how. Thanks.
>>
>> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Jobs have names in spark. You can prefix it to the file name when
>>> writing to directory I guess
>>>
>>>  val sparkConf = new SparkConf().
>>>setAppName(sparkAppName).
>>>
>>>
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sat, 17 Jul 2021 at 17:40, Eric Beabes 
>>> wrote:
>>>
 Reason we've two jobs writing to the same directory is that the data is
 partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only way
 to do this is to create an hourly partition (/mmdd/hh). Is that the
 only way to solve this?

 On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:

> IMHO - this is a bad idea esp in failure scenarios.
>
> How about creating a subfolder each for the jobs?
>
> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
> wrote:
>
>> We've two (or more) jobs that write data into the same directory via
>> a Dataframe.save method. We need to be able to figure out which job wrote
>> which file. Maybe provide a 'prefix' to the file names. I was wondering 
>> if
>> there's any 'option' that allows us to do this. Googling didn't come up
>> with any solution so thought of asking the Spark experts on this mailing
>> list.
>>
>> Thanks in advance.
>>
> --
> Best Regards,
> Ayan Guha
>



Re: Naming files while saving a Dataframe

2021-07-17 Thread Mich Talebzadeh
Jobs have names in spark. You can prefix it to the file name when writing
to directory I guess

 val sparkConf = new SparkConf().
   setAppName(sparkAppName).




   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Sat, 17 Jul 2021 at 17:40, Eric Beabes  wrote:

> Reason we've two jobs writing to the same directory is that the data is
> partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only way
> to do this is to create an hourly partition (/mmdd/hh). Is that the
> only way to solve this?
>
> On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:
>
>> IMHO - this is a bad idea esp in failure scenarios.
>>
>> How about creating a subfolder each for the jobs?
>>
>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
>> wrote:
>>
>>> We've two (or more) jobs that write data into the same directory via a
>>> Dataframe.save method. We need to be able to figure out which job wrote
>>> which file. Maybe provide a 'prefix' to the file names. I was wondering if
>>> there's any 'option' that allows us to do this. Googling didn't come up
>>> with any solution so thought of asking the Spark experts on this mailing
>>> list.
>>>
>>> Thanks in advance.
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>


Re: Naming files while saving a Dataframe

2021-07-17 Thread Eric Beabes
Reason we've two jobs writing to the same directory is that the data is
partitioned by 'day' (mmdd) but the job runs hourly. Maybe the only way
to do this is to create an hourly partition (/mmdd/hh). Is that the
only way to solve this?

On Fri, Jul 16, 2021 at 5:45 PM ayan guha  wrote:

> IMHO - this is a bad idea esp in failure scenarios.
>
> How about creating a subfolder each for the jobs?
>
> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
> wrote:
>
>> We've two (or more) jobs that write data into the same directory via a
>> Dataframe.save method. We need to be able to figure out which job wrote
>> which file. Maybe provide a 'prefix' to the file names. I was wondering if
>> there's any 'option' that allows us to do this. Googling didn't come up
>> with any solution so thought of asking the Spark experts on this mailing
>> list.
>>
>> Thanks in advance.
>>
> --
> Best Regards,
> Ayan Guha
>


Re: Naming files while saving a Dataframe

2021-07-16 Thread ayan guha
IMHO - this is a bad idea esp in failure scenarios.

How about creating a subfolder each for the jobs?

On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes 
wrote:

> We've two (or more) jobs that write data into the same directory via a
> Dataframe.save method. We need to be able to figure out which job wrote
> which file. Maybe provide a 'prefix' to the file names. I was wondering if
> there's any 'option' that allows us to do this. Googling didn't come up
> with any solution so thought of asking the Spark experts on this mailing
> list.
>
> Thanks in advance.
>
-- 
Best Regards,
Ayan Guha


Naming files while saving a Dataframe

2021-07-16 Thread Eric Beabes
We've two (or more) jobs that write data into the same directory via a
Dataframe.save method. We need to be able to figure out which job wrote
which file. Maybe provide a 'prefix' to the file names. I was wondering if
there's any 'option' that allows us to do this. Googling didn't come up
with any solution so thought of asking the Spark experts on this mailing
list.

Thanks in advance.