[
https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064570#comment-14064570
]
Cheng Hao commented on SPARK-2523:
----------------------------------
sbt/sbt hive/console
{code:title=prepare.scala|borderStyle=solid}
hql("CREATE TABLE add_part_test (key STRING, value STRING) PARTITIONED BY (ds
STRING) ROW FORMAT SERDE
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS
RCFILE").collect
hql("from src insert into table add_part_test PARTITION (ds='2010-01-01')
select 100,100 limit 1").collect
hql("ALTER TABLE add_part_test set SERDE
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe").collect
hql("from src insert into table add_part_test PARTITION (ds='2010-01-02')
select 200,200 limit 1").collect
hql("select * from add_part_test").collect.mkString("\n")
{code}
{panel:title=Output (Without this PR)}
14/07/17 12:12:02 WARN scheduler.TaskSetManager: Loss was due to
java.lang.ClassCastException
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to
org.apache.hadoop.hive.serde2.lazy.LazyString
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:52)
at
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:28)
at
org.apache.spark.sql.hive.HiveInspectors$class.unwrapData(hiveUdfs.scala:287)
at
org.apache.spark.sql.hive.execution.HiveTableScan.unwrapData(HiveTableScan.scala:48)
at
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:101)
at
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:99)
at
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:203)
at
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:200)
{panel}
And
{panel:title=Output (With this PR)}
[100,100,2010-01-01]
[200,200,2010-01-02]
{panel}
{code:title=analysis.sql|borderStyle=solid}
hql("DESCRIBE EXTENDED add_part_test partition
(ds='2010-01-01')").collect.mkString("\n")
hql("DESCRIBE EXTENDED add_part_test partition
(ds='2010-01-02')").collect.mkString("\n")
{code}
You probably will see the different output like:
{panel:title=Output}
Detailed Partition Information Partition(values: 2010-01-01, ...
serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe...)
Detailed Partition Information Partition(values: 2010-01-02, ...
serdeInfo:SerDeInfo(name:null,
serializationLib:org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe...)
{panel}
> Potential Bugs if SerDe is not the identical among partitions and table
> -----------------------------------------------------------------------
>
> Key: SPARK-2523
> URL: https://issues.apache.org/jira/browse/SPARK-2523
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Reporter: Cheng Hao
>
> In HiveTableScan.scala, ObjectInspector was created for all of the partition
> based records, which probably causes ClassCastException if the object
> inspector is not identical among table & partitions.
--
This message was sent by Atlassian JIRA
(v6.2#6252)