Re: Supports Dynamic Table Options for Spark SQL

Ryan Blue Tue, 16 Nov 2021 09:49:50 -0800

Mich, time travel will use the newly added VERSION AS OF or TIMESTAMP AS OF
syntax.


On Tue, Nov 16, 2021 at 12:40 AM Mich Talebzadeh <[email protected]>
wrote:

> As I stated before, hints are designed to direct the optimizer to choose
> a certain query execution plan based on the specific criteria.
>
>
> -- time travel
> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */
>
>
> The alternative would be to specify time travel by creating a snapshot
> based on CURRENT_DATE() range which encapsulates time travel for
> 'snapshot-id'='10963874102873L'
>
>
> CREATE SNAPSHOT t_snap
>
>   START WITH CURRENT_DATE() - 30
>
>   NEXT CURRENT_DATE()
>
>   AS SELECT * FROM t
>
>
> SELECT * FROM t_snap
>
>
> HTH
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Tue, 16 Nov 2021 at 04:26, Hyukjin Kwon <[email protected]> wrote:
>
>> My biggest concern with the syntax in hints is that Spark SQL's options
>> can change results (e.g., CSV's header options) whereas hints are generally
>> not designed to affect the external results if I am not mistaken. This is
>> counterintuitive.
>> I left the comment in the PR but what's the real benefit over leveraging:
>> SET conf and RESET conf? we can extract options from runtime session
>> configurations e.g., SessionConfigSupport.
>>
>> On Tue, 16 Nov 2021 at 04:30, Nicholas Chammas <
>> [email protected]> wrote:
>>
>>> Side note about time travel: There is a PR
>>> <https://github.com/apache/spark/pull/34497> to add VERSION/TIMESTAMP
>>> AS OF syntax to Spark SQL.
>>>
>>> On Mon, Nov 15, 2021 at 2:23 PM Ryan Blue <[email protected]> wrote:
>>>
>>>> I want to note that I wouldn't recommend time traveling this way by
>>>> using the hint for `snapshot-id`. Instead, we want to add the standard SQL
>>>> syntax for that in a separate PR. This is useful for other options that
>>>> help a table scan perform better, like specifying the target split size.
>>>>
>>>> You're right that this isn't a typical optimizer hint, but I'm not sure
>>>> what other syntax is possible for this use case. How else would we send
>>>> custom properties through to the scan?
>>>>
>>>> On Mon, Nov 15, 2021 at 9:25 AM Mich Talebzadeh <
>>>> [email protected]> wrote:
>>>>
>>>>> I am looking at the hint and it appears to me (I stand corrected), it
>>>>> is a single table hint as below:
>>>>>
>>>>> -- time travel
>>>>> SELECT * FROM t /*+ OPTIONS('snapshot-id'='10963874102873L') */
>>>>>
>>>>> My assumption is that any view on this table will also benefit from
>>>>> this hint. This is not a hint to optimizer in a classical sense. Only a
>>>>> snapshot hint. Normally, a hint is an instruction to the optimizer.
>>>>> When writing SQL, one may know information about the data unknown to the
>>>>> optimizer. Hints enable one to make decisions normally made by the
>>>>> optimizer, sometimes causing the optimizer to select a plan that it sees 
>>>>> as
>>>>> higher cost.
>>>>>
>>>>>
>>>>> So far as this case is concerned, it looks OK and I concur it should
>>>>> be extended to write as well.
>>>>>
>>>>>
>>>>> HTH
>>>>>
>>>>>
>>>>>    view my Linkedin profile
>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>
>>>>>
>>>>>
>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>>>> any loss, damage or destruction of data or any other property which may
>>>>> arise from relying on this email's technical content is explicitly
>>>>> disclaimed. The author will in no case be liable for any monetary damages
>>>>> arising from such loss, damage or destruction.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, 15 Nov 2021 at 17:02, Russell Spitzer <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> I think since we probably will end up using this same syntax on
>>>>>> write, this makes a lot of sense. Unless there is another good way to
>>>>>> express a similar concept during a write operation I think going forward
>>>>>> with this would be ok.
>>>>>>
>>>>>> On Mon, Nov 15, 2021 at 10:44 AM Ryan Blue <[email protected]> wrote:
>>>>>>
>>>>>>> The proposed feature is to be able to pass options through SQL like
>>>>>>> you would when using the DataFrameReader API, so it would work for
>>>>>>> all sources that support read options. Read options are part of the DSv2
>>>>>>> API, there just isn’t a way to pass options when using SQL. The PR also 
>>>>>>> has
>>>>>>> a non-Iceberg example, which is being able to customize some JDBC source
>>>>>>> behaviors per query (e.g., fetchSize), rather than globally in the 
>>>>>>> table’s
>>>>>>> options.
>>>>>>>
>>>>>>> The proposed syntax is odd, but I think that's an artifact of Spark
>>>>>>> introducing read options that aren't a normal part of SQL. Seems 
>>>>>>> reasonable
>>>>>>> to me to pass them through a hint.
>>>>>>>
>>>>>>> On Mon, Nov 15, 2021 at 2:18 AM Mich Talebzadeh <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Interesting.
>>>>>>>>
>>>>>>>> What is this going to add on top of support for Apache Iceberg
>>>>>>>> <https://www.dremio.com/data-lake/apache-iceberg/>. Will it be in
>>>>>>>> line with support for Hive ACID tables or Delta Lake?
>>>>>>>>
>>>>>>>> HTH
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>    view my Linkedin profile
>>>>>>>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *Disclaimer:* Use it at your own risk. Any and all responsibility
>>>>>>>> for any loss, damage or destruction of data or any other property 
>>>>>>>> which may
>>>>>>>> arise from relying on this email's technical content is explicitly
>>>>>>>> disclaimed. The author will in no case be liable for any monetary 
>>>>>>>> damages
>>>>>>>> arising from such loss, damage or destruction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, 15 Nov 2021 at 01:56, Zhun Wang <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi dev,
>>>>>>>>>
>>>>>>>>> We are discussing Support Dynamic Table Options for Spark SQL (
>>>>>>>>> https://github.com/apache/spark/pull/34072). It is currently not
>>>>>>>>> sure if the syntax makes sense, and would like to know if there is 
>>>>>>>>> other
>>>>>>>>> feedback or opinion on this.
>>>>>>>>>
>>>>>>>>> I would appreciate any feedback on this.
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ryan Blue
>>>>>>> Tabular
>>>>>>>
>>>>>>
>>>>
>>>> --
>>>> Ryan Blue
>>>> Tabular
>>>>
>>>

-- 
Ryan Blue
Tabular

Re: Supports Dynamic Table Options for Spark SQL

Reply via email to