[ 
https://issues.apache.org/jira/browse/SPARK-8628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santiago M. Mola updated SPARK-8628:
------------------------------------
    Description: 
SPARK-5009 introduced the following code in AbstractSparkSQLParser:

{code}
def parse(input: String): LogicalPlan = {
    // Initialize the Keywords.
    lexical.initialize(reservedWords)
    phrase(start)(new lexical.Scanner(input)) match {
      case Success(plan, _) => plan
      case failureOrError => sys.error(failureOrError.toString)
    }
  }
{code}

The corresponding initialize method in SqlLexical is not thread-safe:

{code}
  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
    reserved.clear()
    reserved ++= keywords
  }
{code}

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.

  was:
SPARK-5009 introduced the following code:

def parse(input: String): LogicalPlan = {
    // Initialize the Keywords.
    lexical.initialize(reservedWords)
    phrase(start)(new lexical.Scanner(input)) match {
      case Success(plan, _) => plan
      case failureOrError => sys.error(failureOrError.toString)
    }
  }

The corresponding initialize method in SqlLexical is not thread-safe:

  /* This is a work around to support the lazy setting */
  def initialize(keywords: Seq[String]): Unit = {
    reserved.clear()
    reserved ++= keywords
  }

I'm hitting this when parsing multiple SQL queries concurrently. When one query 
parsing starts, it empties the reserved keyword list, then a race-condition 
occurs and other queries fail to parse because they recognize keywords as 
identifiers.


> Race condition in AbstractSparkSQLParser.parse
> ----------------------------------------------
>
>                 Key: SPARK-8628
>                 URL: https://issues.apache.org/jira/browse/SPARK-8628
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.3.0, 1.3.1, 1.4.0
>            Reporter: Santiago M. Mola
>            Priority: Critical
>              Labels: regression
>
> SPARK-5009 introduced the following code in AbstractSparkSQLParser:
> {code}
> def parse(input: String): LogicalPlan = {
>     // Initialize the Keywords.
>     lexical.initialize(reservedWords)
>     phrase(start)(new lexical.Scanner(input)) match {
>       case Success(plan, _) => plan
>       case failureOrError => sys.error(failureOrError.toString)
>     }
>   }
> {code}
> The corresponding initialize method in SqlLexical is not thread-safe:
> {code}
>   /* This is a work around to support the lazy setting */
>   def initialize(keywords: Seq[String]): Unit = {
>     reserved.clear()
>     reserved ++= keywords
>   }
> {code}
> I'm hitting this when parsing multiple SQL queries concurrently. When one 
> query parsing starts, it empties the reserved keyword list, then a 
> race-condition occurs and other queries fail to parse because they recognize 
> keywords as identifiers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to