mathias kluba created SPARK-10153:
-------------------------------------

             Summary: Unable to query Avro data from Flume using SparkSQL
                 Key: SPARK-10153
                 URL: https://issues.apache.org/jira/browse/SPARK-10153
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 1.4.1, 1.5.0
            Reporter: mathias kluba


I use the Avro event serialiazer of Flume.
The schema is:
{code}
{
"type":"record",
"name":"Event",
"fields":[
  {
    "name":"headers",
    "type":{"type":"map","values":"string"}
  },
  {
    "name":"body",
    "type":"bytes"
  }
]}
{code}

I'm using HDP 2.2 with Hive 0.14 (using TEZ) and I'm able to query the data 
correctly.

But with Spark SQL, I have issues.
I tested with 1.4.1 and 1.5.0 (last snapshot) and I have different error 
message for different issues.

In 1.4.1 I have:
{code:sql}
select body from mytable limit 10;
{code}
{code}
 conversion of string to map<string,string>not supported yet
{code}

It's related to the header which is a map<string,string>, but I don't 
understand why it's trying to convert to String. Maybe to display it as a 
single column ? If I do a "Select" without the header, I still have this issue.

With 1.5.0 I have:
{code:sql}
select body from mytable limit 10;
{code}
{code}
java.lang.RuntimeException: java.lang.ClassCastException: java.lang.String 
cannot be cast to [B
{code}

It's clearly not the same error, it seems that 1.5.0 is fixing the bug with the 
header.
So it seems that there's an error why SparkSQL try to cast the body as String, 
even if it's a ByteArray in the column type (from the Avro schema).
When I do the cast manually, it works:
{code:sql}
select cast(body as String) from mytable limit 10;
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to