[
https://issues.apache.org/jira/browse/SPARK-9213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Reynold Xin updated SPARK-9213:
-------------------------------
Description:
I'm creating an umbrella ticket to improve regular expression performance for
string expressions. Right now our use of regular expressions is inefficient for
two reasons:
1. Java regex in general is slow.
2. We have to convert everything from UTF8 encoded bytes into Java String, and
then run regex on it, and then convert it back.
There are libraries in Java that provide regex support directly on UTF8 encoded
bytes. One prominent example is joni, used in JRuby.]
Note: all regex functions are in
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
was:
I'm creating an umbrella ticket to improve regular expression performance for
string expressions. Right now our use of regular expressions is inefficient for
two reasons:
1. Java regex in general is slow.
2. We have to convert everything from UTF8 encoded bytes into Java String, and
then run regex on it, and then convert it back.
There are libraries in Java that provide regex support directly on UTF8 encoded
bytes. One prominent example is joni, used in JRuby.]
> Improve regular expression performance (via joni)
> -------------------------------------------------
>
> Key: SPARK-9213
> URL: https://issues.apache.org/jira/browse/SPARK-9213
> Project: Spark
> Issue Type: Umbrella
> Components: SQL
> Reporter: Reynold Xin
>
> I'm creating an umbrella ticket to improve regular expression performance for
> string expressions. Right now our use of regular expressions is inefficient
> for two reasons:
> 1. Java regex in general is slow.
> 2. We have to convert everything from UTF8 encoded bytes into Java String,
> and then run regex on it, and then convert it back.
> There are libraries in Java that provide regex support directly on UTF8
> encoded bytes. One prominent example is joni, used in JRuby.]
> Note: all regex functions are in
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]