Stephen Boesch created SPARK-2686:
-------------------------------------
Summary: Add Length support to Spark SQL and HQL and Strlen
support to SQL
Key: SPARK-2686
URL: https://issues.apache.org/jira/browse/SPARK-2686
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 1.0.0, 0.9.1, 0.9.2, 1.1.0, 1.1.1
Environment: all
Reporter: Stephen Boesch
Priority: Minor
Fix For: 1.1.1
Syntactic, parsing, and operational support have been added for LEN(GTH) and
STRLEN functions.
Examples:
SQL:
import org.apache.spark.sql._
case class TestData(key: Int, value: String)
val sqlc = new SQLContext(sc)
import sqlc._
val testData: SchemaRDD = sqlc.sparkContext.parallelize(
(1 to 100).map(i => TestData(i, i.toString)))
testData.registerAsTable("testData")
sqlc.sql("select length(key) as key_len from testData order by key_len desc
limit 5").collect
res12: Array[org.apache.spark.sql.Row] = Array([3], [2], [2], [2], [2])
HQL:
val hc = new org.apache.spark.sql.hive.HiveContext(sc)
import hc._
hc.hql
hql("select length(grp) from simplex").collect
res14: Array[org.apache.spark.sql.Row] = Array([6], [6], [6], [6])
As far as codebase changes: they have been purposefully made similar to the
ones made for for adding SUBSTR(ING) from July 17:
SQLParser, Optimizer, Expression, stringOperations, and HiveQL were the main
classes changed. The testing suites affected are ConstantFolding and
ExpressionEvaluation.
In addition some ad-hoc testing was done as shown in the examples.
--
This message was sent by Atlassian JIRA
(v6.2#6252)