Re: Error aliasing an array column.

Rakesh Chalasani Tue, 09 Feb 2016 14:40:21 -0800

Do you mean using "alias" instead of "as"? Unfortunately, that didn't help


> val arrayCol = functions.array(df("a"), df("b")).alias("arrayCol")

still throws the error.

Surprisingly, doing the same thing inside a select works,
> df.select(functions.array(df("a"), df("b")).as("arrayCol")).show()

+--------+
|arrayCol|
+--------+
|  [0, 1]|
|  [1, 2]|
|  [2, 3]|
|  [3, 4]|
|  [4, 5]|
|  [5, 6]|
|  [6, 7]|
|  [7, 8]|
|  [8, 9]|
| [9, 10]|
+--------+



On Tue, Feb 9, 2016 at 4:52 PM Ted Yu <[email protected]> wrote:

> How about changing the last line to:
>
> scala> val df2 = df.select(functions.array(df("a"),
> df("b")).alias("arrayCol"))
> df2: org.apache.spark.sql.DataFrame = [arrayCol: array<int>]
>
> scala> df2.show()
> +--------+
> |arrayCol|
> +--------+
> |  [0, 1]|
> |  [1, 2]|
> |  [2, 3]|
> |  [3, 4]|
> |  [4, 5]|
> |  [5, 6]|
> |  [6, 7]|
> |  [7, 8]|
> |  [8, 9]|
> | [9, 10]|
> +--------+
>
> FYI
>
> On Tue, Feb 9, 2016 at 1:38 PM, Rakesh Chalasani <[email protected]>
> wrote:
>
>> Sorry, didn't realize the mail didn't show the code. Using Spark release
>> 1.6.0
>>
>> Below is an example to reproduce it.
>>
>> import org.apache.spark.sql.SQLContext
>> val sqlContext = new SQLContext(sparkContext)
>> import sqlContext.implicits._
>> import org.apache.spark.sql.functions
>>
>> case class Test(a:Int, b:Int)
>> val data = sparkContext.parallelize(Array.range(0, 10).map(x => Test(x,
>> x+1)))
>> val df = data.toDF()
>> val arrayCol = functions.array(df("a"), df("b")).as("arrayCol")
>>
>> this throws the following exception:
>> ava.lang.UnsupportedOperationException
>>         at
>> org.apache.spark.sql.catalyst.expressions.PrettyAttribute.nullable(namedExpressions.scala:289)
>>         at
>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40)
>>         at
>> org.apache.spark.sql.catalyst.expressions.CreateArray$$anonfun$dataType$3.apply(complexTypeCreator.scala:40)
>>         at
>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40)
>>         at
>> scala.collection.IndexedSeqOptimized$$anonfun$exists$1.apply(IndexedSeqOptimized.scala:40)
>>         at
>> scala.collection.IndexedSeqOptimized$class.segmentLength(IndexedSeqOptimized.scala:189)
>>         at
>> scala.collection.mutable.ArrayBuffer.segmentLength(ArrayBuffer.scala:47)
>>         at
>> scala.collection.GenSeqLike$class.prefixLength(GenSeqLike.scala:92)
>>         at scala.collection.AbstractSeq.prefixLength(Seq.scala:40)
>>         at
>> scala.collection.IndexedSeqOptimized$class.exists(IndexedSeqOptimized.scala:40)
>>         at
>> scala.collection.mutable.ArrayBuffer.exists(ArrayBuffer.scala:47)
>>         at
>> org.apache.spark.sql.catalyst.expressions.CreateArray.dataType(complexTypeCreator.scala:40)
>>         at
>> org.apache.spark.sql.catalyst.expressions.Alias.dataType(namedExpressions.scala:136)
>>         at
>> org.apache.spark.sql.catalyst.expressions.NamedExpression$class.typeSuffix(namedExpressions.scala:84)
>>         at
>> org.apache.spark.sql.catalyst.expressions.Alias.typeSuffix(namedExpressions.scala:120)
>>         at
>> org.apache.spark.sql.catalyst.expressions.Alias.toString(namedExpressions.scala:155)
>>         at
>> org.apache.spark.sql.catalyst.expressions.Expression.prettyString(Expression.scala:207)
>>         at org.apache.spark.sql.Column.toString(Column.scala:138)
>>         at java.lang.String.valueOf(String.java:2994)
>>         at scala.runtime.ScalaRunTime$.stringOf(ScalaRunTime.scala:331)
>>         at
>> scala.runtime.ScalaRunTime$.replStringOf(ScalaRunTime.scala:337)
>>         at .<init>(<console>:20)
>>         at .<clinit>(<console>)
>>         at $print(<console>)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>         at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>         at java.lang.reflect.Method.invoke(Method.java:497)
>>         at
>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
>>         at
>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
>>         at
>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
>>         at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
>>         at
>> org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
>>
>> On Tue, Feb 9, 2016 at 4:23 PM Ted Yu <[email protected]> wrote:
>>
>>> Do you mind pastebin'ning code snippet and exception one more time - I
>>> couldn't see them in your original email.
>>>
>>> Which Spark release are you using ?
>>>
>>> On Tue, Feb 9, 2016 at 11:55 AM, rakeshchalasani <[email protected]>
>>> wrote:
>>>
>>>> Hi All:
>>>>
>>>> I am getting an "UnsupportedOperationException" when trying to alias an
>>>> array column. The issue seems to be at "CreateArray" expression ->
>>>> dataType,
>>>> which checks for nullability of its children, while aliasing is
>>>> creating a
>>>> PrettyAttribute that does not implement nullability.
>>>>
>>>> Below is an example to reproduce it.
>>>>
>>>>
>>>>
>>>> this throws the following exception:
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-developers-list.1001551.n3.nabble.com/Error-aliasing-an-array-column-tp16288.html
>>>> Sent from the Apache Spark Developers List mailing list archive at
>>>> Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>
>>>
>

Re: Error aliasing an array column.

Reply via email to