Thank you for starting the official discussion, Rui.

'Unneeded API' doesn't sound like a good frame for this discussion
because it ignores the existing users and codes completely.
Technically, the above mentioned reasons look irrelevant to any
specific existing bugs or future maintenance cost saving. Instead, the
deprecation already causes costs to the community (your PR, the future
migration guide, and the communication with the customers like Q&A)
and to the users for the actual migration to new API and validations.
Given that, for now, the goal of this proposal looks like a pure
educational purpose to advertise new APIs to Apache Spark 3.4+ users.

Can we be more conservative at Apache Spark deprecation and allow
users to use both APIs freely without any concern of uncertain
insupportability? I simply want to avoid the situation where the pure
educational deprecation itself becomes `Unneeded Deprecation` in the
community.

Dongjoon.

On Thu, Jul 7, 2022 at 2:26 PM Rui Wang <amaliu...@apache.org> wrote:
>
> I want to highlight in case I missed this in the original email:
>
> The 4 API will not be deleted. They will just be marked as deprecated 
> annotations and we encourage users to use their alternatives.
>
>
> -Rui
>
> On Thu, Jul 7, 2022 at 2:23 PM Rui Wang <amaliu...@apache.org> wrote:
>>
>> Hi Community,
>>
>> Proposal:
>> I want to discuss a proposal to deprecate the following Catalog API:
>> def listColumns(dbName: String, tableName: String): Dataset[Column]
>> def getTable(dbName: String, tableName: String): Table
>> def getFunction(dbName: String, functionName: String): Function
>> def tableExists(dbName: String, tableName: String): Boolean
>>
>>
>> Context:
>> We have been adding table identifier with catalog name (aka 3 layer 
>> namespace) support to Catalog API in 
>> https://issues.apache.org/jira/browse/SPARK-39235.
>> The basic idea is, if an API accepts:
>> 1. only tableName:String, we allow it accepts "a.b.c" and goes analyzer 
>> which treats a as catalog name, b namespace name and c table name.
>> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer which 
>> treats a as catalog name, b namespace name.
>> Meanwhile we still maintain the backwards compatibility for such API to make 
>> sure past behavior remains the same. E.g. If you only use tableName it is 
>> still recognized by the session catalog.
>>
>> With this effort ongoing, the above 4 API becomes not fully compatible with 
>> the 3 layer namespace.
>>
>> use tableExists(dbName: String, tableName: String) as an example, given that 
>> it takes two parameters but leaves no room for the extra catalog name. Also 
>> if we want to reuse the two parameters, which one will be the one that takes 
>> more than one name part?
>>
>>
>> How?
>> So how to improve the above 4 API? There are two options:
>> a. Expand those four API to let those API accept catalog names. For example, 
>> tableExists(catalogName: String, dbName: String, tableName: String).
>> b. mark those API as `deprecated`.
>>
>> I am proposing to follow option B which does API deprecation.
>>
>> Why?
>> 1. Reduce unneeded API. The existing API can support the same behavior given 
>> SPARK-39235. For example, tableExists(dbName, tableName) can be replaced to 
>> use tableExists("dbName.tableName").
>> 2. Reduce incomplete API. The proposed API to deprecate does not support 3 
>> layer namespace now, and it is hard to do so (where to take 3 part names)?
>> 3. Deprecation suggests users to migrate their usage on API.
>> 4. There was existing practice that we deprecated CreateExternalTable API 
>> when adding CreateTable API: 
>> https://github.com/apache/spark/blob/7dcb4bafd02dd43213d3cc4a936c170bda56ddc5/sql/core/src/main/scala/org/apache/spark/sql/catalog/Catalog.scala#L220
>>
>>
>> What do you think?
>>
>> Thanks,
>> Rui Wang
>>
>>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to