[ 
https://issues.apache.org/jira/browse/SPARK-41918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655554#comment-17655554
 ] 

Rui Wang commented on SPARK-41918:
----------------------------------

I did some tests locally and find something as the below:

If I rename a field, of course the code that access the field must be updated.

Then in terms of backwards compatibility, the client uses old named field can 
talk to the server uses the new named field without a problem.

also in terms of forwards compatibility, it works nicely. 


So now probably I know it better: renaming fields only require to recompile the 
code after that binaries are supposed to work as before. 


> Refine the naming in proto messages
> -----------------------------------
>
>                 Key: SPARK-41918
>                 URL: https://issues.apache.org/jira/browse/SPARK-41918
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Connect
>    Affects Versions: 3.4.0
>            Reporter: Ruifeng Zheng
>            Priority: Major
>
> normally, we name the fields after the corresponding LogiclalPlan or 
> DataFrame API, but they are not consistent in protos, for example, the column 
> name:
> {code:java}
>   message UnresolvedRegex {
>     // (Required) The column name used to extract column with regex.
>     string col_name = 1;
>   }
> {code}
> {code:java}
>   message Alias {
>     // (Required) The expression that alias will be added on.
>     Expression expr = 1;
>     // (Required) a list of name parts for the alias.
>     //
>     // Scalar columns only has one name that presents.
>     repeated string name = 2;
>     // (Optional) Alias metadata expressed as a JSON map.
>     optional string metadata = 3;
>   }
> {code}
> {code:java}
> // Relation of type [[Deduplicate]] which have duplicate rows removed, could 
> consider either only
> // the subset of columns or all the columns.
> message Deduplicate {
>   // (Required) Input relation for a Deduplicate.
>   Relation input = 1;
>   // (Optional) Deduplicate based on a list of column names.
>   //
>   // This field does not co-use with `all_columns_as_keys`.
>   repeated string column_names = 2;
>   // (Optional) Deduplicate based on all the columns of the input relation.
>   //
>   // This field does not co-use with `column_names`.
>   optional bool all_columns_as_keys = 3;
> }
> {code}
> {code:java}
> // Computes basic statistics for numeric and string columns, including count, 
> mean, stddev, min,
> // and max. If no columns are given, this function computes statistics for 
> all numerical or
> // string columns.
> message StatDescribe {
>   // (Required) The input relation.
>   Relation input = 1;
>   // (Optional) Columns to compute statistics on.
>   repeated string cols = 2;
> }
> {code}
> we probably should unify the naming:
> single column -> `column`
> multi columns -> `columns`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to