Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

Rui Wang Tue, 09 Aug 2022 22:14:32 -0700

Thanks for the idea!

I am thinking that the usage of "combined = StructType( a.fields +
b.fields)" is still good because
1) it is not horrible to merge a and b in this way.
2) itself clarifies the intention which is merge two struct's fields to
construct a new struct
3) you also have room to apply more complicated operations on fields
merging. For example remove duplicate files with the same name or use
a.fields but remove some fields if they are in b.


overloading "+" could be
1. it's ambiguous on what this plus is doing.
2. If you define + is a concatenation on the fields, then it's limited to
only do the concatenation. How about other operations like extract fields
from a based on b? Maybe overloading "-"? In this case the item list will
grow.

-Rui

On Tue, Aug 9, 2022 at 1:10 PM Tim <bosse...@posteo.de> wrote:

> Hi all,
>
> this is my first message to the Spark mailing list, so please bear with
> me if I don't fully meet your communication standards.
> I just wanted to discuss one aspect that I've stumbled across several
> times over the past few weeks.
> When working with Spark, I often run into the problem of having to merge
> two (or more) existing StructTypes into a new one to define a schema.
> Usually this looks similar (in Python) to the following simplified
> example:
>
>          a = StructType([StuctField("field_a", StringType())])
>          b = StructType([StructField("field_b", IntegerType())])
>
>          combined = StructType( a.fields + b.fields)
>
> My idea, which I would like to discuss, is to shorten the above example
> in Python as follows by supporting Python's add operator for
> StructTypes:
>
>          combined = a + b
>
>
> What do you think of this idea? Are there any reasons why this is not
> yet part of StructType's functionality?
> If you support this idea, I could create a first PR for further and
> deeper discussion.
>
> Best
> Tim
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

Reply via email to