[ 
https://issues.apache.org/jira/browse/SPARK-11080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-11080:
-------------------------------
    Description: 
In the current implementation of named expressions' ExprIds, we rely on a 
per-JVM AtomicLong to ensure that expression ids are unique within a JVM. 
However, these expression ids will not be globally unique. This opens the 
potential for id collisions if new expression ids happen to be created inside 
of tasks rather than on the driver.

There are currently a few cases where tasks allocate expression ids, which 
happen to be safe because those expressions are never compared to expressions 
created on the driver. In order to guard against the introduction of invalid 
comparisons between driver-created and executor-created expression ids, this 
patch extends ExprId to incorporate a UUID to identify the JVM that created the 
id, which prevents collisions.

  was:
My understanding of {{NamedExpression.newExprId}} is that it is only intended 
to be called on the driver. If it is called on executors, then this may lead to 
scenarios where the same expression id is re-used in two different 
NamedExpressions.

More generally, I think that calling {{NamedExpression.newExprId}} within tasks 
may be an indicator of potential attribute binding bugs. Therefore, I think 
that we should prevent {{NamedExpression.newExprId}} from being called inside 
of tasks by throwing an exception when such calls occur. 


> Incorporate per-JVM id into ExprId to prevent unsafe cross-JVM comparisions
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-11080
>                 URL: https://issues.apache.org/jira/browse/SPARK-11080
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>             Fix For: 1.6.0
>
>
> In the current implementation of named expressions' ExprIds, we rely on a 
> per-JVM AtomicLong to ensure that expression ids are unique within a JVM. 
> However, these expression ids will not be globally unique. This opens the 
> potential for id collisions if new expression ids happen to be created inside 
> of tasks rather than on the driver.
> There are currently a few cases where tasks allocate expression ids, which 
> happen to be safe because those expressions are never compared to expressions 
> created on the driver. In order to guard against the introduction of invalid 
> comparisons between driver-created and executor-created expression ids, this 
> patch extends ExprId to incorporate a UUID to identify the JVM that created 
> the id, which prevents collisions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to