[
https://issues.apache.org/jira/browse/SPARK-6628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16011490#comment-16011490
]
Weiqing Yang commented on SPARK-6628:
-------------------------------------
We met with this issue too.
The major issue is:
{code}
org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat
{code}
cannot be cast to
{code}
org.apache.hadoop.hive.ql.io.HiveOutputFormat
{code}
The reason is:
{code}
public interface HiveOutputFormat<K, V> extends OutputFormat<K, V> {…}
public class HiveHBaseTableOutputFormat extends
TableOutputFormat<ImmutableBytesWritable> implements
OutputFormat<ImmutableBytesWritable, Object> {...}
{code}
>From the two snippets above, we can see both HiveHBaseTableOutputFormat and
>HiveOutputFormat 'extends' /'implements' OutputFormat, and can not cast to
>each other.
Spark initials the outputformat in SparkHiveWriterContainer of Spark 1.6, 2.0,
2.1 (or: in HiveFileFormat of Spark 2.2 /Master)
{code}
@transient private lazy val outputFormat =
jobConf.value.getOutputFormat.asInstanceOf[HiveOutputFormat[AnyRef,
Writable]]
{code}
Notice: this file output format is {color:red}HiveOutputFormat{color}
However, when users write the data into the hbase, the outputFormat is
HiveHBaseTableOutputFormat, it isn't instance of HiveOutputFormat.
I am going to submit a PR for this.
> ClassCastException occurs when executing sql statement "insert into" on hbase
> table
> -----------------------------------------------------------------------------------
>
> Key: SPARK-6628
> URL: https://issues.apache.org/jira/browse/SPARK-6628
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Reporter: meiyoula
>
> Error: org.apache.spark.SparkException: Job aborted due to stage failure:
> Task 1 in stage 3.0 failed 4 times, most recent failure: Lost task 1.3 in
> stage 3.0 (TID 12, vm-17): java.lang.ClassCastException:
> org.apache.hadoop.hive.hbase.HiveHBaseTableOutputFormat cannot be cast to
> org.apache.hadoop.hive.ql.io.HiveOutputFormat
> at
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat$lzycompute(hiveWriterContainers.scala:72)
> at
> org.apache.spark.sql.hive.SparkHiveWriterContainer.outputFormat(hiveWriterContainers.scala:71)
> at
> org.apache.spark.sql.hive.SparkHiveWriterContainer.getOutputName(hiveWriterContainers.scala:91)
> at
> org.apache.spark.sql.hive.SparkHiveWriterContainer.initWriters(hiveWriterContainers.scala:115)
> at
> org.apache.spark.sql.hive.SparkHiveWriterContainer.executorSideSetup(hiveWriterContainers.scala:84)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToFile$1(InsertIntoHiveTable.scala:112)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at
> org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:93)
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
> at org.apache.spark.scheduler.Task.run(Task.scala:56)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:197)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]