Re: [VOTE] Release Spark 3.2.1 (RC1)

huaxin gao Tue, 18 Jan 2022 00:04:50 -0800

Hi Bjorn,
Thanks for testing 3.2.1 RC1!
DataFrame.to_pandas_on_spark is deprecated in 3.3.0, not in 3.2.1. That's
why you didn't get any Warnings.


Huaxin

On Sat, Jan 15, 2022 at 4:12 PM Dongjoon Hyun <dongjoon.h...@gmail.com>
wrote:

> Hi, Bjorn.
>
> It seems that you are confused about my announcement. The test coverage
> announcement is about the `master` branch which is for the upcoming Apache
> Spark 3.3.0. Apache Spark 3.3 will start to support Java 17, not old
> release branches like Apache Spark 3.2.x/3.1.x/3.0.x.
>
> > 1. If I change the java version to 17 I did get an error which I did not
> copy. But have you built this with java 11 or java 17? I have notis that we
> test using java 17, so I was hoping to update java to version 17.
>
> The Apache Spark community is still actively developing, stabilizing, and
> optimizing Spark on Java 17. For the details, please see the following.
>
> SPARK-33772: Build and Run Spark on Java 17
> SPARK-35781: Support Spark on Apple Silicon on macOS natively on Java 17
> SPARK-37593: Optimize HeapMemoryAllocator to avoid memory waste when using
> G1GC
>
> In short, please don't expect Java 17 with Spark 3.2.x and older versions.
>
> Thanks,
> Dongjoon.
>
>
>
> On Sat, Jan 15, 2022 at 11:19 AM Bjørn Jørgensen <bjornjorgen...@gmail.com>
> wrote:
>
>> 2. Things
>>
>> I did change the dockerfile from jupyter/docker-stacks to
>> https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile
>> then I build, tag and push.
>> And I start it with docker-compose like
>>
>> version: '2.1'
>> services:
>>     jupyter:
>>         image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1
>>         restart: 'no'
>>         volumes:
>>             - ./notebooks:/home/jovyan/notebooks
>>         ports:
>>             - "8881:8888"
>>             - "8181:8080"
>>             - "7077:7077"
>>             - "4040:4040"
>>         environment:
>>             NB_UID: ${UID}
>>             NB_GID: ${GID}
>>
>>
>> 1. If I change the java version to 17 I did get an error which I did not
>> copy. But have you built this with java 11 or java 17? I have notis that we
>> test using java 17, so I was hoping to update java to version 17.
>>
>> 2.
>>
>> In a notebook I start spark by
>>
>> from pyspark import pandas as ps
>> import re
>> import numpy as np
>> import os
>> #import pandas as pd
>>
>> from pyspark import SparkContext, SparkConf
>> from pyspark.sql import SparkSession
>> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
>> from pyspark.sql.types import StructType, StructField,
>> StringType,IntegerType
>>
>> os.environ["PYARROW_IGNORE_TIMEZONE"]="1"
>>
>> def get_spark_session(app_name: str, conf: SparkConf):
>>     conf.setMaster('local[*]')
>>     conf \
>>       .set('spark.driver.memory', '64g')\
>>       .set("fs.s3a.access.key", "minio") \
>>       .set("fs.s3a.secret.key", "KEY") \
>>       .set("fs.s3a.endpoint", "http://192.168.1.127:9000";) \
>>       .set("spark.hadoop.fs.s3a.impl",
>> "org.apache.hadoop.fs.s3a.S3AFileSystem") \
>>       .set("spark.hadoop.fs.s3a.path.style.access", "true") \
>>       .set("spark.sql.repl.eagerEval.enabled", "True") \
>>       .set("spark.sql.adaptive.enabled", "True") \
>>       .set("spark.serializer",
>> "org.apache.spark.serializer.KryoSerializer") \
>>       .set("spark.sql.repl.eagerEval.maxNumRows", "10000")
>>
>>     return
>> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate()
>>
>> spark = get_spark_session("Falk", SparkConf())
>>
>> Then I run this code
>>
>> f06 =
>> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json")
>>
>> pf06 = f06.to_pandas_on_spark()
>>
>> pf06.info()
>>
>>
>>
>> And I did not get any errors or warnings. But acording to
>> https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6
>>
>> (Spark)DataFrame.to_pandas_on_spark is deprecated.
>>
>> So I was supposed to get some info to change to pandas_api. Which I did
>> not get.
>>
>>
>>
>>
>>
>> fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao <huaxin.ga...@gmail.com>:
>>
>>> The two regressions have been fixed. I will cut RC2 tomorrow late
>>> afternoon.
>>>
>>> Thanks,
>>> Huaxin
>>>
>>> On Wed, Jan 12, 2022 at 9:11 AM huaxin gao <huaxin.ga...@gmail.com>
>>> wrote:
>>>
>>>> Thank you all for testing and voting!
>>>>
>>>> I will -1 this RC because
>>>> https://issues.apache.org/jira/browse/SPARK-37855 and
>>>> https://issues.apache.org/jira/browse/SPARK-37859 are regressions.
>>>> These are not blockers but I think it's better to fix them in 3.2.1. I will
>>>> prepare for RC2.
>>>>
>>>> Thanks,
>>>> Huaxin
>>>>
>>>> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao <y...@apache.org> wrote:
>>>>
>>>>> +1 (non-binding).
>>>>>
>>>>> Chao Sun <sunc...@apache.org> 于2022年1月12日周三 16:10写道：
>>>>>
>>>>>> +1 (non-binding). Thanks Huaxin for driving the release!
>>>>>>
>>>>>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng <ruife...@foxmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 (non-binding)
>>>>>>>
>>>>>>> Thanks, ruifeng zheng
>>>>>>>
>>>>>>> ------------------ Original ------------------
>>>>>>> *From:* "Cheng Su" <chen...@fb.com.INVALID>;
>>>>>>> *Date:* Wed, Jan 12, 2022 02:54 PM
>>>>>>> *To:* "Qian Sun"<qian.sun2...@gmail.com>;"huaxin gao"<
>>>>>>> huaxin.ga...@gmail.com>;
>>>>>>> *Cc:* "dev"<dev@spark.apache.org>;
>>>>>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>>>>
>>>>>>> +1 (non-binding). Checked commit history and ran some local tests.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Cheng Su
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From: *Qian Sun <qian.sun2...@gmail.com>
>>>>>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM
>>>>>>> *To: *huaxin gao <huaxin.ga...@gmail.com>
>>>>>>> *Cc: *dev <dev@spark.apache.org>
>>>>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1)
>>>>>>>
>>>>>>> +1
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Looks good. All integration tests passed.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Qian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> 2022年1月11日 上午2:09，huaxin gao <huaxin.ga...@gmail.com> 写道：
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Please vote on releasing the following candidate as Apache Spark
>>>>>>> version 3.2.1.
>>>>>>>
>>>>>>>
>>>>>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes
>>>>>>> if a majority
>>>>>>>
>>>>>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes.
>>>>>>>
>>>>>>>
>>>>>>> [ ] +1 Release this package as Apache Spark 3.2.1
>>>>>>> [ ] -1 Do not release this package because ...
>>>>>>>
>>>>>>> To learn more about Apache Spark, please see
>>>>>>> http://spark.apache.org/
>>>>>>>
>>>>>>> There are currently no issues targeting 3.2.1 (try project = SPARK
>>>>>>> AND
>>>>>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In
>>>>>>> Progress"))
>>>>>>>
>>>>>>> The tag to be voted on is v3.2.1-rc1 (commit
>>>>>>> 2b0ee226f8dd17b278ad11139e62464433191653):
>>>>>>>
>>>>>>> https://github.com/apache/spark/tree/v3.2.1-rc1
>>>>>>>
>>>>>>> The release files, including signatures, digests, etc. can be found
>>>>>>> at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/
>>>>>>>
>>>>>>> Signatures used for Spark RCs can be found in this file:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>>>>>
>>>>>>> The staging repository for this release can be found at:
>>>>>>>
>>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1395/
>>>>>>>
>>>>>>> The documentation corresponding to this release can be found at:
>>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/
>>>>>>>
>>>>>>> The list of bug fixes going into 3.2.1 can be found at the following
>>>>>>> URL:
>>>>>>> https://s.apache.org/7tzik
>>>>>>>
>>>>>>> This release is using the release script of the tag v3.2.1-rc1.
>>>>>>>
>>>>>>> FAQ
>>>>>>>
>>>>>>>
>>>>>>> =========================
>>>>>>> How can I help test this release?
>>>>>>> =========================
>>>>>>>
>>>>>>> If you are a Spark user, you can help us test this release by taking
>>>>>>> an existing Spark workload and running on this release candidate,
>>>>>>> then
>>>>>>> reporting any regressions.
>>>>>>>
>>>>>>> If you're working in PySpark you can set up a virtual env and install
>>>>>>> the current RC and see if anything important breaks, in the
>>>>>>> Java/Scala
>>>>>>> you can add the staging repository to your projects resolvers and
>>>>>>> test
>>>>>>> with the RC (make sure to clean up the artifact cache before/after so
>>>>>>> you don't end up building with an out of date RC going forward).
>>>>>>>
>>>>>>> ===========================================
>>>>>>> What should happen to JIRA tickets still targeting 3.2.1?
>>>>>>> ===========================================
>>>>>>>
>>>>>>> The current list of open tickets targeted at 3.2.1 can be found at:
>>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>>>>>> Version/s" = 3.2.1
>>>>>>>
>>>>>>> Committers should look at those and triage. Extremely important bug
>>>>>>> fixes, documentation, and API tweaks that impact compatibility should
>>>>>>> be worked on immediately. Everything else please retarget to an
>>>>>>> appropriate release.
>>>>>>>
>>>>>>> ==================
>>>>>>> But my bug isn't fixed?
>>>>>>> ==================
>>>>>>>
>>>>>>> In order to make timely releases, we will typically not hold the
>>>>>>> release unless the bug in question is a regression from the previous
>>>>>>> release. That being said, if there is something which is a regression
>>>>>>> that has not been correctly targeted please ping me or a committer to
>>>>>>> help target the issue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>
>> --
>> Bjørn Jørgensen
>> Vestre Aspehaug 4, 6010 Ålesund
>> Norge
>>
>> +47 480 94 297
>>
>

Re: [VOTE] Release Spark 3.2.1 (RC1)

Reply via email to