[ 
https://issues.apache.org/jira/browse/SPARK-2523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064570#comment-14064570
 ] 

Cheng Hao commented on SPARK-2523:
----------------------------------

sbt/sbt hive/console
{code:title=prepare.scala|borderStyle=solid}
hql("CREATE TABLE add_part_test (key STRING, value STRING) PARTITIONED BY (ds 
STRING) ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS 
RCFILE").collect
hql("from src insert into table add_part_test PARTITION (ds='2010-01-01') 
select 100,100 limit 1").collect
hql("ALTER TABLE add_part_test set SERDE 
'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe").collect
hql("from src insert into table add_part_test PARTITION (ds='2010-01-02') 
select 200,200 limit 1").collect
hql("select * from add_part_test").collect.mkString("\n")
{code}
{panel:title=Output (Without this PR)}
14/07/17 12:12:02 WARN scheduler.TaskSetManager: Loss was due to 
java.lang.ClassCastException
java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to 
org.apache.hadoop.hive.serde2.lazy.LazyString
        at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:52)
        at 
org.apache.hadoop.hive.serde2.lazy.objectinspector.primitive.LazyStringObjectInspector.getPrimitiveJavaObject(LazyStringObjectInspector.java:28)
        at 
org.apache.spark.sql.hive.HiveInspectors$class.unwrapData(hiveUdfs.scala:287)
        at 
org.apache.spark.sql.hive.execution.HiveTableScan.unwrapData(HiveTableScan.scala:48)
        at 
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:101)
        at 
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$attributeFunctions$1$$anonfun$apply$3.apply(HiveTableScan.scala:99)
        at 
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:203)
        at 
org.apache.spark.sql.hive.execution.HiveTableScan$$anonfun$12$$anonfun$apply$5.apply(HiveTableScan.scala:200)
{panel}
And 
{panel:title=Output (With this PR)}
[100,100,2010-01-01]
[200,200,2010-01-02]
{panel}

{code:title=analysis.sql|borderStyle=solid}
hql("DESCRIBE EXTENDED add_part_test partition 
(ds='2010-01-01')").collect.mkString("\n")
hql("DESCRIBE EXTENDED add_part_test partition 
(ds='2010-01-02')").collect.mkString("\n")
{code}
You probably will see the different output like:

{panel:title=Output}
Detailed Partition Information  Partition(values: 2010-01-01, ... 
serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe...)
    
Detailed Partition Information  Partition(values: 2010-01-02, ... 
serdeInfo:SerDeInfo(name:null, 
serializationLib:org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe...)      
{panel}

> Potential Bugs if SerDe is not the identical among partitions and table
> -----------------------------------------------------------------------
>
>                 Key: SPARK-2523
>                 URL: https://issues.apache.org/jira/browse/SPARK-2523
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Cheng Hao
>
> In HiveTableScan.scala, ObjectInspector was created for all of the partition 
> based records, which probably causes ClassCastException if the object 
> inspector is not identical among table & partitions.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to