[ https://issues.apache.org/jira/browse/SPARK-24988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-24988. ---------------------------------- Resolution: Won't Fix > Add a castBySchema method which casts all the values of a DataFrame based on > the DataTypes of a StructType > ---------------------------------------------------------------------------------------------------------- > > Key: SPARK-24988 > URL: https://issues.apache.org/jira/browse/SPARK-24988 > Project: Spark > Issue Type: New Feature > Components: SQL > Affects Versions: 2.4.0 > Reporter: mahmoud mehdi > Priority: Minor > > The main goal of this User Story is to extend the Dataframe methods in order > to add a method which casts all the values of a Dataframe, based on the > DataTypes of a StructType. > This feature can be useful when we have a large dataframe, and that we need > to make multiple casts. In that case, we won't have to cast each value > independently, all we have to do is to pass a StructType to the method > castBySchema with the types we need (In real world examples, this schema is > generally provided by the client, which was my case). > I'll explain the new feature via an example, let's create a dataframe of > strings : > {code:java} > val df = Seq(("test1", "0"), ("test2", "1")).toDF("name", "id") > {code} > Let's suppose that we want to cast the second column's values of the > dataframe to integers, all we have to do is the following : > {code:java} > val schema = StructType( Seq( StructField("name", StringType, true), > StructField("id", IntegerType, true))){code} > {code:java} > df.castBySchema(schema) > {code} > I made sure that castBySchema works also with nested StructTypes by adding > several tests. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org