Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

Cheng Lian Wed, 27 Aug 2014 15:32:08 -0700

I believe in your case, the “magic” happens in TableReader.fillObject
<https://github.com/apache/spark/blob/4fa2fda88fc7beebb579ba808e400113b512533b/core/src/main/scala/org/apache/spark/storage/BlockManager.scala#L706-L712>.
Here we unwrap the field value according to the object inspector of that
field. It seems that somehow a FloatObjectInspector is specified for the
total_price field. I don’t think CSVSerde is responsible for this, since it
sets all field object inspectors to javaStringObjectInspector (here
<https://github.com/ogrodnek/csv-serde/blob/f315c1ae4b21a8288eb939e7c10f3b29c1a854ef/src/main/java/com/bizo/hive/serde/csv/CSVSerde.java#L59-L61>
).


Which version of Spark SQL are you using? If you are using a snapshot
version, please provide the exact Git commit hash. Thanks!



On Tue, Aug 26, 2014 at 8:29 AM, chutium <teng....@gmail.com> wrote:

> oops, i tried on a managed table, column types will not be changed
>
> so it is mostly due to the serde lib CSVSerDe
> (
> https://github.com/ogrodnek/csv-serde/blob/master/src/main/java/com/bizo/hive/serde/csv/CSVSerde.java#L123
> )
> or maybe CSVReader from opencsv?...
>
> but if the columns are defined as string, no matter what type returned from
> custom SerDe or CSVReader, they should be cast to string at the end right?
>
> why do not use the schema from hive metadata directly?
>
>
>
> --
> View this message in context:
> http://apache-spark-developers-list.1001551.n3.nabble.com/HiveContext-schemaRDD-printSchema-get-different-dataTypes-feature-or-a-bug-really-strange-and-surpri-tp8035p8039.html
> Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: HiveContext, schemaRDD.printSchema get different dataTypes, feature or a bug? really strange and surprised...

Reply via email to