RE: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

assaf.mendelson Wed, 30 Nov 2016 22:57:07 -0800

I may be mistaken but if I remember correctly spark behaves differently when it 
is bounded in the past and when it is not. Specifically I seem to recall a fix 
which made sure that when there is no lower bound then the aggregation is done 
one by one instead of doing the whole range for each window. So I believe it 
should be configured exactly the same as in scala/java so the optimization 
would take place.
Assaf.

From: rxin [via Apache Spark Developers List] 
[mailto:ml-node+s1001551n20069...@n3.nabble.com]
Sent: Wednesday, November 30, 2016 8:35 PM
To: Mendelson, Assaf
Subject: Re: [SPARK-17845] [SQL][PYTHON] More self-evident window function 
frame boundary API

Yes I'd define unboundedPreceding to -sys.maxsize, but also any value less than 
min(-sys.maxsize, _JAVA_MIN_LONG) are considered unboundedPreceding too. We 
need to be careful with long overflow when transferring data over to Java.

On Wed, Nov 30, 2016 at 10:04 AM, Maciej Szymkiewicz <[hidden 
email]</user/SendEmail.jtp?type=node&node=20069&i=0>> wrote:

It is platform specific so theoretically can be larger, but 2**63 - 1 is a 
standard on 64 bit platform and 2**31 - 1 on 32bit platform. I can submit a 
patch but I am not sure how to proceed. Personally I would set

unboundedPreceding = -sys.maxsize

unboundedFollowing = sys.maxsize

to keep backwards compatibility.
On 11/30/2016 06:52 PM, Reynold Xin wrote:
Ah ok for some reason when I did the pull request sys.maxsize was much larger 
than 2^63. Do you want to submit a patch to fix this?

On Wed, Nov 30, 2016 at 9:48 AM, Maciej Szymkiewicz <[hidden 
email]</user/SendEmail.jtp?type=node&node=20069&i=1>> wrote:

The problem is that -(1 << 63) is -(sys.maxsize + 1) so the code which used to 
work before is off by one.
On 11/30/2016 06:43 PM, Reynold Xin wrote:
Can you give a repro? Anything less than -(1 << 63) is considered negative 
infinity (i.e. unbounded preceding).

On Wed, Nov 30, 2016 at 8:27 AM, Maciej Szymkiewicz <[hidden 
email]</user/SendEmail.jtp?type=node&node=20069&i=2>> wrote:
Hi,

I've been looking at the SPARK-17845 and I am curious if there is any
reason to make it a breaking change. In Spark 2.0 and below we could use:

    Window().partitionBy("foo").orderBy("bar").rowsBetween(-sys.maxsize,
sys.maxsize))

In 2.1.0 this code will silently produce incorrect results (ROWS BETWEEN
-1 PRECEDING AND UNBOUNDED FOLLOWING) Couldn't we use
Window.unboundedPreceding equal -sys.maxsize to ensure backward
compatibility?

--

Maciej Szymkiewicz

---------------------------------------------------------------------
To unsubscribe e-mail: [hidden 
email]</user/SendEmail.jtp?type=node&node=20069&i=3>

--

Maciej Szymkiewicz

--

Maciej Szymkiewicz

________________________________
If you reply to this email, your message will be added to the discussion below:
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-17845-SQL-PYTHON-More-self-evident-window-function-frame-boundary-API-tp20064p20069.html
To start a new topic under Apache Spark Developers List, email 
ml-node+s1001551n1...@n3.nabble.com<mailto:ml-node+s1001551n1...@n3.nabble.com>
To unsubscribe from Apache Spark Developers List, click 
here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>.
NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>

--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-17845-SQL-PYTHON-More-self-evident-window-function-frame-boundary-API-tp20064p20074.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

RE: [SPARK-17845] [SQL][PYTHON] More self-evident window function frame boundary API

Reply via email to