[
https://issues.apache.org/jira/browse/SPARK-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Charlie Evans updated SPARK-15804:
----------------------------------
Description:
Adding metadata with col().as(_, metadata) then saving the resultant dataframe
does not save the metadata. No error is thrown. Only see the schema contains
the metadata before saving and does not contain the metadata after saving and
loading the dataframe. Was working fine with 1.6.1.
{code}
case class TestRow(a: String, b: Int)
val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
val df = spark.createDataFrame(rows)
import org.apache.spark.sql.types.MetadataBuilder
val md = new MetadataBuilder().putString("key", "value").build()
val dfWithMeta = df.select(col("a"), col("b").as("b", md))
println(dfWithMeta.schema.json)
dfWithMeta.write.parquet("dfWithMeta")
val dfWithMeta2 = spark.read.parquet("dfWithMeta")
println(dfWithMeta2.schema.json)
{code}
was:
Adding metadata with col().as(_, metadata) then saving the resultant dataframe
does not save the metadata. No error is thrown. Only see the schema contains
the metadata before saving and does not contain the metadata after saving and
loading the dataframe.
{code}
case class TestRow(a: String, b: Int)
val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
val df = spark.createDataFrame(rows)
import org.apache.spark.sql.types.MetadataBuilder
val md = new MetadataBuilder().putString("key", "value").build()
val dfWithMeta = df.select(col("a"), col("b").as("b", md))
println(dfWithMeta.schema.json)
dfWithMeta.write.parquet("dfWithMeta")
val dfWithMeta2 = spark.read.parquet("dfWithMeta")
println(dfWithMeta2.schema.json)
{code}
> Manually added metadata not saving with parquet
> -----------------------------------------------
>
> Key: SPARK-15804
> URL: https://issues.apache.org/jira/browse/SPARK-15804
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.0.0
> Reporter: Charlie Evans
>
> Adding metadata with col().as(_, metadata) then saving the resultant
> dataframe does not save the metadata. No error is thrown. Only see the schema
> contains the metadata before saving and does not contain the metadata after
> saving and loading the dataframe. Was working fine with 1.6.1.
> {code}
> case class TestRow(a: String, b: Int)
> val rows = TestRow("a", 0) :: TestRow("b", 1) :: TestRow("c", 2) :: Nil
> val df = spark.createDataFrame(rows)
> import org.apache.spark.sql.types.MetadataBuilder
> val md = new MetadataBuilder().putString("key", "value").build()
> val dfWithMeta = df.select(col("a"), col("b").as("b", md))
> println(dfWithMeta.schema.json)
> dfWithMeta.write.parquet("dfWithMeta")
> val dfWithMeta2 = spark.read.parquet("dfWithMeta")
> println(dfWithMeta2.schema.json)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]