Hi Bjorn, Thanks for testing 3.2.1 RC1! DataFrame.to_pandas_on_spark is deprecated in 3.3.0, not in 3.2.1. That's why you didn't get any Warnings.
Huaxin On Sat, Jan 15, 2022 at 4:12 PM Dongjoon Hyun <dongjoon.h...@gmail.com> wrote: > Hi, Bjorn. > > It seems that you are confused about my announcement. The test coverage > announcement is about the `master` branch which is for the upcoming Apache > Spark 3.3.0. Apache Spark 3.3 will start to support Java 17, not old > release branches like Apache Spark 3.2.x/3.1.x/3.0.x. > > > 1. If I change the java version to 17 I did get an error which I did not > copy. But have you built this with java 11 or java 17? I have notis that we > test using java 17, so I was hoping to update java to version 17. > > The Apache Spark community is still actively developing, stabilizing, and > optimizing Spark on Java 17. For the details, please see the following. > > SPARK-33772: Build and Run Spark on Java 17 > SPARK-35781: Support Spark on Apple Silicon on macOS natively on Java 17 > SPARK-37593: Optimize HeapMemoryAllocator to avoid memory waste when using > G1GC > > In short, please don't expect Java 17 with Spark 3.2.x and older versions. > > Thanks, > Dongjoon. > > > > On Sat, Jan 15, 2022 at 11:19 AM Bjørn Jørgensen <bjornjorgen...@gmail.com> > wrote: > >> 2. Things >> >> I did change the dockerfile from jupyter/docker-stacks to >> https://github.com/bjornjorgensen/docker-stacks/blob/master/pyspark-notebook/Dockerfile >> then I build, tag and push. >> And I start it with docker-compose like >> >> version: '2.1' >> services: >> jupyter: >> image: bjornjorgensen/spark-notebook:spark-3.2.1RC-1 >> restart: 'no' >> volumes: >> - ./notebooks:/home/jovyan/notebooks >> ports: >> - "8881:8888" >> - "8181:8080" >> - "7077:7077" >> - "4040:4040" >> environment: >> NB_UID: ${UID} >> NB_GID: ${GID} >> >> >> 1. If I change the java version to 17 I did get an error which I did not >> copy. But have you built this with java 11 or java 17? I have notis that we >> test using java 17, so I was hoping to update java to version 17. >> >> 2. >> >> In a notebook I start spark by >> >> from pyspark import pandas as ps >> import re >> import numpy as np >> import os >> #import pandas as pd >> >> from pyspark import SparkContext, SparkConf >> from pyspark.sql import SparkSession >> from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr >> from pyspark.sql.types import StructType, StructField, >> StringType,IntegerType >> >> os.environ["PYARROW_IGNORE_TIMEZONE"]="1" >> >> def get_spark_session(app_name: str, conf: SparkConf): >> conf.setMaster('local[*]') >> conf \ >> .set('spark.driver.memory', '64g')\ >> .set("fs.s3a.access.key", "minio") \ >> .set("fs.s3a.secret.key", "KEY") \ >> .set("fs.s3a.endpoint", "http://192.168.1.127:9000") \ >> .set("spark.hadoop.fs.s3a.impl", >> "org.apache.hadoop.fs.s3a.S3AFileSystem") \ >> .set("spark.hadoop.fs.s3a.path.style.access", "true") \ >> .set("spark.sql.repl.eagerEval.enabled", "True") \ >> .set("spark.sql.adaptive.enabled", "True") \ >> .set("spark.serializer", >> "org.apache.spark.serializer.KryoSerializer") \ >> .set("spark.sql.repl.eagerEval.maxNumRows", "10000") >> >> return >> SparkSession.builder.appName(app_name).config(conf=conf).getOrCreate() >> >> spark = get_spark_session("Falk", SparkConf()) >> >> Then I run this code >> >> f06 = >> spark.read.option("multiline","true").json("/home/jovyan/notebooks/falk/data/norm_test/f06.json") >> >> pf06 = f06.to_pandas_on_spark() >> >> pf06.info() >> >> >> >> And I did not get any errors or warnings. But acording to >> https://github.com/apache/spark/commit/bc7d55fc1046a55df61fdb380629699e9959fcc6 >> >> (Spark)DataFrame.to_pandas_on_spark is deprecated. >> >> So I was supposed to get some info to change to pandas_api. Which I did >> not get. >> >> >> >> >> >> fre. 14. jan. 2022 kl. 07:04 skrev huaxin gao <huaxin.ga...@gmail.com>: >> >>> The two regressions have been fixed. I will cut RC2 tomorrow late >>> afternoon. >>> >>> Thanks, >>> Huaxin >>> >>> On Wed, Jan 12, 2022 at 9:11 AM huaxin gao <huaxin.ga...@gmail.com> >>> wrote: >>> >>>> Thank you all for testing and voting! >>>> >>>> I will -1 this RC because >>>> https://issues.apache.org/jira/browse/SPARK-37855 and >>>> https://issues.apache.org/jira/browse/SPARK-37859 are regressions. >>>> These are not blockers but I think it's better to fix them in 3.2.1. I will >>>> prepare for RC2. >>>> >>>> Thanks, >>>> Huaxin >>>> >>>> On Wed, Jan 12, 2022 at 2:03 AM Kent Yao <y...@apache.org> wrote: >>>> >>>>> +1 (non-binding). >>>>> >>>>> Chao Sun <sunc...@apache.org> 于2022年1月12日周三 16:10写道: >>>>> >>>>>> +1 (non-binding). Thanks Huaxin for driving the release! >>>>>> >>>>>> On Tue, Jan 11, 2022 at 11:56 PM Ruifeng Zheng <ruife...@foxmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 (non-binding) >>>>>>> >>>>>>> Thanks, ruifeng zheng >>>>>>> >>>>>>> ------------------ Original ------------------ >>>>>>> *From:* "Cheng Su" <chen...@fb.com.INVALID>; >>>>>>> *Date:* Wed, Jan 12, 2022 02:54 PM >>>>>>> *To:* "Qian Sun"<qian.sun2...@gmail.com>;"huaxin gao"< >>>>>>> huaxin.ga...@gmail.com>; >>>>>>> *Cc:* "dev"<dev@spark.apache.org>; >>>>>>> *Subject:* Re: [VOTE] Release Spark 3.2.1 (RC1) >>>>>>> >>>>>>> +1 (non-binding). Checked commit history and ran some local tests. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Cheng Su >>>>>>> >>>>>>> >>>>>>> >>>>>>> *From: *Qian Sun <qian.sun2...@gmail.com> >>>>>>> *Date: *Tuesday, January 11, 2022 at 7:55 PM >>>>>>> *To: *huaxin gao <huaxin.ga...@gmail.com> >>>>>>> *Cc: *dev <dev@spark.apache.org> >>>>>>> *Subject: *Re: [VOTE] Release Spark 3.2.1 (RC1) >>>>>>> >>>>>>> +1 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Looks good. All integration tests passed. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Qian >>>>>>> >>>>>>> >>>>>>> >>>>>>> 2022年1月11日 上午2:09,huaxin gao <huaxin.ga...@gmail.com> 写道: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>> version 3.2.1. >>>>>>> >>>>>>> >>>>>>> The vote is open until Jan. 13th at 12 PM PST (8 PM UTC) and passes >>>>>>> if a majority >>>>>>> >>>>>>> +1 PMC votes are cast, with a minimum of 3 + 1 votes. >>>>>>> >>>>>>> >>>>>>> [ ] +1 Release this package as Apache Spark 3.2.1 >>>>>>> [ ] -1 Do not release this package because ... >>>>>>> >>>>>>> To learn more about Apache Spark, please see >>>>>>> http://spark.apache.org/ >>>>>>> >>>>>>> There are currently no issues targeting 3.2.1 (try project = SPARK >>>>>>> AND >>>>>>> "Target Version/s" = "3.2.1" AND status in (Open, Reopened, "In >>>>>>> Progress")) >>>>>>> >>>>>>> The tag to be voted on is v3.2.1-rc1 (commit >>>>>>> 2b0ee226f8dd17b278ad11139e62464433191653): >>>>>>> >>>>>>> https://github.com/apache/spark/tree/v3.2.1-rc1 >>>>>>> >>>>>>> The release files, including signatures, digests, etc. can be found >>>>>>> at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-bin/ >>>>>>> >>>>>>> Signatures used for Spark RCs can be found in this file: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>>>>>> >>>>>>> The staging repository for this release can be found at: >>>>>>> >>>>>>> https://repository.apache.org/content/repositories/orgapachespark-1395/ >>>>>>> >>>>>>> The documentation corresponding to this release can be found at: >>>>>>> https://dist.apache.org/repos/dist/dev/spark/v3.2.1-rc1-docs/ >>>>>>> >>>>>>> The list of bug fixes going into 3.2.1 can be found at the following >>>>>>> URL: >>>>>>> https://s.apache.org/7tzik >>>>>>> >>>>>>> This release is using the release script of the tag v3.2.1-rc1. >>>>>>> >>>>>>> FAQ >>>>>>> >>>>>>> >>>>>>> ========================= >>>>>>> How can I help test this release? >>>>>>> ========================= >>>>>>> >>>>>>> If you are a Spark user, you can help us test this release by taking >>>>>>> an existing Spark workload and running on this release candidate, >>>>>>> then >>>>>>> reporting any regressions. >>>>>>> >>>>>>> If you're working in PySpark you can set up a virtual env and install >>>>>>> the current RC and see if anything important breaks, in the >>>>>>> Java/Scala >>>>>>> you can add the staging repository to your projects resolvers and >>>>>>> test >>>>>>> with the RC (make sure to clean up the artifact cache before/after so >>>>>>> you don't end up building with an out of date RC going forward). >>>>>>> >>>>>>> =========================================== >>>>>>> What should happen to JIRA tickets still targeting 3.2.1? >>>>>>> =========================================== >>>>>>> >>>>>>> The current list of open tickets targeted at 3.2.1 can be found at: >>>>>>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>>>>>> Version/s" = 3.2.1 >>>>>>> >>>>>>> Committers should look at those and triage. Extremely important bug >>>>>>> fixes, documentation, and API tweaks that impact compatibility should >>>>>>> be worked on immediately. Everything else please retarget to an >>>>>>> appropriate release. >>>>>>> >>>>>>> ================== >>>>>>> But my bug isn't fixed? >>>>>>> ================== >>>>>>> >>>>>>> In order to make timely releases, we will typically not hold the >>>>>>> release unless the bug in question is a regression from the previous >>>>>>> release. That being said, if there is something which is a regression >>>>>>> that has not been correctly targeted please ping me or a committer to >>>>>>> help target the issue. >>>>>>> >>>>>>> >>>>>>> >>>>>> >> >> -- >> Bjørn Jørgensen >> Vestre Aspehaug 4, 6010 Ålesund >> Norge >> >> +47 480 94 297 >> >