We are talking about DELETE/UPDATE/MERGE operations. There is only SQL support for these operations. There is no DataFrame API support for them.* Therefore write options are not applicable. Thus SQLConf is the only available mechanism I can use to override the table property. For reference, we currently support setting distribution mode using write option, SQLConf and table property. It seems to me that https://github.com/apache/iceberg/pull/6838/ is a precedent for what I'd like to do.
* It would be of interest to support performing DELETE/UPDATE/MERGE from DataFrames, but that is a whole other topic. On Wed, Jul 26, 2023 at 12:04 PM Ryan Blue <b...@tabular.io> wrote: > I think we should aim to have the same behavior across properties that are > set in SQL conf, table config, and write options. Having SQL conf override > table config for this doesn't make sense to me. If the need is to override > table configuration, then write options are the right way to do it. > > On Wed, Jul 26, 2023 at 10:10 AM Wing Yew Poon <wyp...@cloudera.com.invalid> > wrote: > >> I was on vacation. >> Currently, write modes (copy-on-write/merge-on-read) can only be set as >> table properties, and default to copy-on-write. We have a customer who >> wants to use copy-on-write for certain Spark jobs that write to some >> Iceberg table and merge-on-read for other Spark jobs writing to the same >> table, because of the write characteristics of those jobs. This seems like >> a use case that should be supported. The only way they can do this >> currently is to toggle the table property as needed before doing the >> writes. This is not a sustainable workaround. >> Hence, I think it would be useful to be able to configure the write mode >> as a SQLConf. I also disagree that the table property should always win. If >> this is the case, there is no way to override it. The existing behavior in >> SparkConfParser is to use the option if set, else use the session conf if >> set, else use the table property. This applies across the board. >> - Wing Yew >> >> >> >> >> >> >> On Sun, Jul 16, 2023 at 4:48 PM Ryan Blue <b...@tabular.io> wrote: >> >>> Yes, I agree that there is value for administrators from having some >>> things exposed as Spark SQL configuration. That gets much harder when you >>> want to use the SQLConf for table-level settings, though. For example, the >>> target split size is something that was an engine setting in the Hadoop >>> world, even though it makes no sense to use the same setting across vastly >>> different tables --- think about joining a fact table with a dimension >>> table. >>> >>> Settings like write mode are table-level settings. It matters what is >>> downstream of the table. You may want to set a *default* write mode, but >>> the table-level setting should always win. Currently, there are limits to >>> overriding the write mode in SQL. That's why we should add hints. For >>> anything beyond that, I think we need to discuss what you're trying to do. >>> If it's to override a table-level setting with a SQL global, then we should >>> understand the use case better. >>> >>> On Fri, Jul 14, 2023 at 6:09 PM Wing Yew Poon >>> <wyp...@cloudera.com.invalid> wrote: >>> >>>> Also, in the case of write mode (I mean write.delete.mode, >>>> write.update.mode, write.merge.mode), these cannot be set as options >>>> currently; they are only settable as table properties. >>>> >>>> On Fri, Jul 14, 2023 at 5:58 PM Wing Yew Poon <wyp...@cloudera.com> >>>> wrote: >>>> >>>>> I think that different use cases benefit from or even require >>>>> different solutions. I think enabling options in Spark SQL is helpful, but >>>>> allowing some configurations to be done in SQLConf is also helpful. >>>>> For Cheng Pan's use case (to disable locality), I think providing a >>>>> conf (which can be added to spark-defaults.conf by a cluster admin) is >>>>> useful. >>>>> For my customer's use case ( >>>>> https://github.com/apache/iceberg/pull/7790), being able to set the >>>>> write mode per Spark job (where right now it can only be set as a table >>>>> property) is useful. Allowing this to be done in the SQL with an >>>>> option/hint could also work, but as I understand it, Szehon's PR ( >>>>> https://github.com/apache/spark/pull/416830) is only applicable to >>>>> reads, not writes. >>>>> >>>>> - Wing Yew >>>>> >>>>> >>>>> On Thu, Jul 13, 2023 at 1:04 AM Cheng Pan <pan3...@gmail.com> wrote: >>>>> >>>>>> Ryan, I understand that option should be job-specific, and >>>>>> introducing an OPTIONS HINT can make Spark SQL achieves similar >>>>>> capabilities as DataFrame API does. >>>>>> >>>>>> My point is, some of the Iceberg options should not be job-specific. >>>>>> >>>>>> For example, Iceberg has an option “locality” which only allows >>>>>> setting at the job level, but Spark has a configuration >>>>>> “spark.shuffle.reduceLocality.enabled” which allows setting at the >>>>>> cluster >>>>>> level, this is a gap block Spark administers migrate to Iceberg because >>>>>> they can not disable it at the cluster level. >>>>>> >>>>>> So, what’s the principle in the Iceberg of classifying a >>>>>> configuration into SQLConf or OPTION? >>>>>> >>>>>> Thanks, >>>>>> Cheng Pan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> > On Jul 5, 2023, at 16:26, Cheng Pan <pan3...@gmail.com> wrote: >>>>>> > >>>>>> > I would argue that the SQLConf way is more in line with Spark >>>>>> user/administrator habits. >>>>>> > >>>>>> > It’s a common practice that Spark administrators set configurations >>>>>> in spark-defaults.conf at the cluster level , and when the user has >>>>>> issues >>>>>> with their Spark SQL/Jobs, the first question they asked mostly is: can >>>>>> it >>>>>> be fixed by adding a spark configuration? >>>>>> > >>>>>> > The OPTIONS way brings additional learning efforts to Spark users >>>>>> and how can Spark administrators set them at cluster level? >>>>>> > >>>>>> > Thanks, >>>>>> > Cheng Pan >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> >> On Jun 17, 2023, at 04:01, Wing Yew Poon >>>>>> <wyp...@cloudera.com.INVALID> wrote: >>>>>> >> >>>>>> >> Hi, >>>>>> >> I recently put up a PR, >>>>>> https://github.com/apache/iceberg/pull/7790, to allow the write mode >>>>>> (copy-on-write/merge-on-read) to be specified in SQLConf. The use case is >>>>>> explained in the PR. >>>>>> >> Cheng Pan has an open PR, >>>>>> https://github.com/apache/iceberg/pull/7733, to allow locality to be >>>>>> specified in SQLConf. >>>>>> >> In the recent past, https://github.com/apache/iceberg/pull/6838/ >>>>>> was a PR to allow the write distribution mode to be specified in SQLConf. >>>>>> This was merged. >>>>>> >> Cheng Pan asks if there is any guidance on when we should allow >>>>>> configs to be specified in SQLConf. >>>>>> >> Thanks, >>>>>> >> Wing Yew >>>>>> >> >>>>>> >> ps. The above open PRs could use reviews by committers. >>>>>> >> >>>>>> > >>>>>> >>>>>> >>> >>> -- >>> Ryan Blue >>> Tabular >>> >> > > -- > Ryan Blue > Tabular >