mathias kluba created SPARK-10153:
-------------------------------------
Summary: Unable to query Avro data from Flume using SparkSQL
Key: SPARK-10153
URL: https://issues.apache.org/jira/browse/SPARK-10153
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 1.4.1, 1.5.0
Reporter: mathias kluba
I use the Avro event serialiazer of Flume.
The schema is:
{code}
{
"type":"record",
"name":"Event",
"fields":[
{
"name":"headers",
"type":{"type":"map","values":"string"}
},
{
"name":"body",
"type":"bytes"
}
]}
{code}
I'm using HDP 2.2 with Hive 0.14 (using TEZ) and I'm able to query the data
correctly.
But with Spark SQL, I have issues.
I tested with 1.4.1 and 1.5.0 (last snapshot) and I have different error
message for different issues.
In 1.4.1 I have:
{code:sql}
select body from mytable limit 10;
{code}
{code}
conversion of string to map<string,string>not supported yet
{code}
It's related to the header which is a map<string,string>, but I don't
understand why it's trying to convert to String. Maybe to display it as a
single column ? If I do a "Select" without the header, I still have this issue.
With 1.5.0 I have:
{code:sql}
select body from mytable limit 10;
{code}
{code}
java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String
cannot be cast to [B
{code}
It's clearly not the same error, it seems that 1.5.0 is fixing the bug with the
header.
So it seems that there's an error why SparkSQL try to cast the body as String,
even if it's a ByteArray in the column type (from the Avro schema).
When I do the cast manually, it works:
{code:sql}
select cast(body as String) from mytable limit 10;
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]