Re: printSchema showing incorrect datatype?

2017-01-25 Thread Koert Kuipers
should we change "def schema" to show the materialized schema? On Wed, Jan 25, 2017 at 1:04 PM, Michael Armbrust wrote: > Encoders are just an object based view on a Dataset. Until you actually > materialize and object, they are not used and thus will not change the >

Re: printSchema showing incorrect datatype?

2017-01-25 Thread Michael Armbrust
Encoders are just an object based view on a Dataset. Until you actually materialize and object, they are not used and thus will not change the schema of the dataframe. On Tue, Jan 24, 2017 at 8:28 AM, Koert Kuipers wrote: > scala> val x = Seq("a", "b").toDF("x") > x:

Re: printSchema showing incorrect datatype?

2017-01-24 Thread Takeshi Yamamuro
Hi, AFAIK `Dataset#printSchema` just prints an output schema of the logical plan that the Dataset has. The logical plans in your example are as follows; --- scala> x.as[Array[Byte]].explain(true) == Analyzed Logical Plan == x: string Project [value#1 AS x#3] +- LocalRelation [value#1]

printSchema showing incorrect datatype?

2017-01-24 Thread Koert Kuipers
scala> val x = Seq("a", "b").toDF("x") x: org.apache.spark.sql.DataFrame = [x: string] scala> x.as[Array[Byte]].printSchema root |-- x: string (nullable = true) scala> x.as[Array[Byte]].map(x => x).printSchema root |-- value: binary (nullable = true) why does the first schema show string