Spark heavily depends on Hadoop writing files. You can try to set the Hadoop property: mapreduce.output.basename
https://spark.apache.org/docs/latest/api/java/org/apache/spark/SparkContext.html#hadoopConfiguration-- > Am 18.07.2021 um 01:15 schrieb Eric Beabes <mailinglist...@gmail.com>: > > > Mich - You're suggesting changing the "Path". Problem is that, we've an > EXTERNAL table created on top of this path so "Path" CANNOT change. If we > could, it would be easy to solve this problem. My question is about changing > the "Filename". > > As Ayan pointed out, Spark doesn't seem to allow "prefixes" for the filenames! > >> On Sat, Jul 17, 2021 at 1:58 PM Mich Talebzadeh <mich.talebza...@gmail.com> >> wrote: >> Using this >> >> df.write.mode("overwrite").format("parquet").saveAsTable("test.ABCD") >> >> That will create a parquet table in the database test. which is essentially >> a hive partition in the format >> >> /user/hive/warehouse/test.db/abcd/000000_0 >> >> >> view my Linkedin profile >> >> >> Disclaimer: Use it at your own risk. Any and all responsibility for any >> loss, damage or destruction of data or any other property which may arise >> from relying on this email's technical content is explicitly disclaimed. The >> author will in no case be liable for any monetary damages arising from such >> loss, damage or destruction. >> >> >> >>> On Sat, 17 Jul 2021 at 20:45, Eric Beabes <mailinglist...@gmail.com> wrote: >>> I am not sure if you've understood the question. Here's how we're saving >>> the DataFrame: >>> >>> df >>> .coalesce(numFiles) >>> .write >>> .partitionBy(partitionDate) >>> .mode("overwrite") >>> .format("parquet") >>> .save(someDirectory) >>> >>> Now where would I add a 'prefix' in this one? >>> >>>> On Sat, Jul 17, 2021 at 10:54 AM Mich Talebzadeh >>>> <mich.talebza...@gmail.com> wrote: >>>> try it see if it works >>>> >>>> fullyQualifiedTableName = appName+'_'+tableName >>>> >>>> >>>> >>>> view my Linkedin profile >>>> >>>> >>>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>>> loss, damage or destruction of data or any other property which may arise >>>> from relying on this email's technical content is explicitly disclaimed. >>>> The author will in no case be liable for any monetary damages arising from >>>> such loss, damage or destruction. >>>> >>>> >>>> >>>>> On Sat, 17 Jul 2021 at 18:02, Eric Beabes <mailinglist...@gmail.com> >>>>> wrote: >>>>> I don't think Spark allows adding a 'prefix' to the file name, does it? >>>>> If it does, please tell me how. Thanks. >>>>> >>>>>> On Sat, Jul 17, 2021 at 9:47 AM Mich Talebzadeh >>>>>> <mich.talebza...@gmail.com> wrote: >>>>>> Jobs have names in spark. You can prefix it to the file name when >>>>>> writing to directory I guess >>>>>> >>>>>> val sparkConf = new SparkConf(). >>>>>> setAppName(sparkAppName). >>>>>> >>>>>> >>>>>> >>>>>> view my Linkedin profile >>>>>> >>>>>> >>>>>> Disclaimer: Use it at your own risk. Any and all responsibility for any >>>>>> loss, damage or destruction of data or any other property which may >>>>>> arise from relying on this email's technical content is explicitly >>>>>> disclaimed. The author will in no case be liable for any monetary >>>>>> damages arising from such loss, damage or destruction. >>>>>> >>>>>> >>>>>> >>>>>>> On Sat, 17 Jul 2021 at 17:40, Eric Beabes <mailinglist...@gmail.com> >>>>>>> wrote: >>>>>>> Reason we've two jobs writing to the same directory is that the data is >>>>>>> partitioned by 'day' (yyyymmdd) but the job runs hourly. Maybe the only >>>>>>> way to do this is to create an hourly partition (/yyyymmdd/hh). Is that >>>>>>> the only way to solve this? >>>>>>> >>>>>>>> On Fri, Jul 16, 2021 at 5:45 PM ayan guha <guha.a...@gmail.com> wrote: >>>>>>>> IMHO - this is a bad idea esp in failure scenarios. >>>>>>>> >>>>>>>> How about creating a subfolder each for the jobs? >>>>>>>> >>>>>>>>> On Sat, 17 Jul 2021 at 9:11 am, Eric Beabes >>>>>>>>> <mailinglist...@gmail.com> wrote: >>>>>>>>> We've two (or more) jobs that write data into the same directory via >>>>>>>>> a Dataframe.save method. We need to be able to figure out which job >>>>>>>>> wrote which file. Maybe provide a 'prefix' to the file names. I was >>>>>>>>> wondering if there's any 'option' that allows us to do this. Googling >>>>>>>>> didn't come up with any solution so thought of asking the Spark >>>>>>>>> experts on this mailing list. >>>>>>>>> >>>>>>>>> Thanks in advance. >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Ayan Guha