Actually, I opened - https://issues.apache.org/jira/browse/SPARK-21093.
2017-06-14 17:08 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > For a shorter reproducer ... > > > df <- createDataFrame(list(list(1L, 1, "1", 0.1)), c("a", "b", "c", "d")) > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > > And running the below multiple times (5~7): > > collect(gapply(df, "a", function(key, x) { x }, schema(df))) > > looks occasionally throwing an error. > > > I will leave here and probably explain more information if a JIRA is open. > This does not look a regression anyway. > > > > 2017-06-14 16:22 GMT+09:00 Hyukjin Kwon <gurwls...@gmail.com>: > >> >> Per https://github.com/apache/spark/tree/v2.1.1, >> >> 1. CentOS 7.2.1511 / R 3.3.3 - this test hangs. >> >> I messed it up a bit while downgrading the R to 3.3.3 (It was an actual >> machine not a VM) so it took me a while to re-try this. >> I re-built this again and checked the R version is 3.3.3 at least. I hope >> this one could double checked. >> >> Here is the self-reproducer: >> >> irisDF <- suppressWarnings(createDataFrame (iris)) >> schema <- structType(structField("Sepal_Length", "double"), >> structField("Avg", "double")) >> df4 <- gapply( >> cols = "Sepal_Length", >> irisDF, >> function(key, x) { >> y <- data.frame(key, mean(x$Sepal_Width), stringsAsFactors = FALSE) >> }, >> schema) >> collect(df4) >> >> >> >> 2017-06-14 16:07 GMT+09:00 Felix Cheung <felixcheun...@hotmail.com>: >> >>> Thanks! Will try to setup RHEL/CentOS to test it out >>> >>> _____________________________ >>> From: Nick Pentreath <nick.pentre...@gmail.com> >>> Sent: Tuesday, June 13, 2017 11:38 PM >>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) >>> To: Felix Cheung <felixcheun...@hotmail.com>, Hyukjin Kwon < >>> gurwls...@gmail.com>, dev <dev@spark.apache.org> >>> >>> Cc: Sean Owen <so...@cloudera.com> >>> >>> >>> Hi yeah sorry for slow response - I was RHEL and OpenJDK but will have >>> to report back later with the versions as am AFK. >>> >>> R version not totally sure but again will revert asap >>> On Wed, 14 Jun 2017 at 05:09, Felix Cheung <felixcheun...@hotmail.com> >>> wrote: >>> >>>> Thanks >>>> This was with an external package and unrelated >>>> >>>> >> macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning ( >>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) >>>> >>>> As for CentOS - would it be possible to test against R older than >>>> 3.4.0? This is the same error reported by Nick below. >>>> >>>> _____________________________ >>>> From: Hyukjin Kwon <gurwls...@gmail.com> >>>> Sent: Tuesday, June 13, 2017 8:02 PM >>>> >>>> Subject: Re: [VOTE] Apache Spark 2.2.0 (RC4) >>>> To: dev <dev@spark.apache.org> >>>> Cc: Sean Owen <so...@cloudera.com>, Nick Pentreath < >>>> nick.pentre...@gmail.com>, Felix Cheung <felixcheun...@hotmail.com> >>>> >>>> >>>> >>>> For the test failure on R, I checked: >>>> >>>> >>>> Per https://github.com/apache/spark/tree/v2.2.0-rc4, >>>> >>>> 1. Windows Server 2012 R2 / R 3.3.1 - passed ( >>>> https://ci.appveyor.com/project/spark-test/spark/build/755- >>>> r-test-v2.2.0-rc4) >>>> 2. macOS Sierra 10.12.3 / R 3.4.0 - passed >>>> 3. macOS Sierra 10.12.3 / R 3.2.3 - passed with a warning ( >>>> https://gist.github.com/HyukjinKwon/85cbcfb245825852df20ed6a9ecfd845) >>>> 4. CentOS 7.2.1511 / R 3.4.0 - reproduced ( >>>> https://gist.github.com/HyukjinKwon/2a736b9f80318618cc147ac2bb1a987d) >>>> >>>> >>>> Per https://github.com/apache/spark/tree/v2.1.1, >>>> >>>> 1. CentOS 7.2.1511 / R 3.4.0 - reproduced ( >>>> https://gist.github.com/HyukjinKwon/6064b0d10bab8fc1dc6212452d83b301) >>>> >>>> >>>> This looks being failed only in CentOS 7.2.1511 / R 3.4.0 given my >>>> tests and observations. >>>> >>>> This is failed in Spark 2.1.1. So, it sounds not a regression although >>>> it is a bug that should be fixed (whether in Spark or R). >>>> >>>> >>>> 2017-06-14 8:28 GMT+09:00 Xiao Li <gatorsm...@gmail.com>: >>>> >>>>> -1 >>>>> >>>>> Spark 2.2 is unable to read the partitioned table created by Spark 2.1 >>>>> or earlier. >>>>> >>>>> Opened a JIRA https://issues.apache.org/jira/browse/SPARK-21085 >>>>> >>>>> Will fix it soon. >>>>> >>>>> Thanks, >>>>> >>>>> Xiao Li >>>>> >>>>> >>>>> >>>>> 2017-06-13 9:39 GMT-07:00 Joseph Bradley <jos...@databricks.com>: >>>>> >>>>>> Re: the QA JIRAs: >>>>>> Thanks for discussing them. I still feel they are very helpful; I >>>>>> particularly notice not having to spend a solid 2-3 weeks of time QAing >>>>>> (unlike in earlier Spark releases). One other point not mentioned >>>>>> above: I >>>>>> think they serve as a very helpful reminder/training for the community >>>>>> for >>>>>> rigor in development. Since we instituted QA JIRAs, contributors have >>>>>> been >>>>>> a lot better about adding in docs early, rather than waiting until the >>>>>> end >>>>>> of the cycle (though I know this is drawing conclusions from >>>>>> correlations). >>>>>> >>>>>> I would vote in favor of the RC...but I'll wait to see about the >>>>>> reported failures. >>>>>> >>>>>> On Fri, Jun 9, 2017 at 3:30 PM, Sean Owen <so...@cloudera.com> wrote: >>>>>> >>>>>>> Different errors as in https://issues.apache.org/j >>>>>>> ira/browse/SPARK-20520 but that's also reporting R test failures. >>>>>>> >>>>>>> I went back and tried to run the R tests and they passed, at least >>>>>>> on Ubuntu 17 / R 3.3. >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 9, 2017 at 9:12 AM Nick Pentreath < >>>>>>> nick.pentre...@gmail.com> wrote: >>>>>>> >>>>>>>> All Scala, Python tests pass. ML QA and doc issues are resolved (as >>>>>>>> well as R it seems). >>>>>>>> >>>>>>>> However, I'm seeing the following test failure on R consistently: >>>>>>>> https://gist.github.com/MLnick/5f26152f97ae8473f807c6895817cf72 >>>>>>>> >>>>>>>> >>>>>>>> On Thu, 8 Jun 2017 at 08:48 Denny Lee <denny.g....@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> +1 non-binding >>>>>>>>> >>>>>>>>> Tested on macOS Sierra, Ubuntu 16.04 >>>>>>>>> test suite includes various test cases including Spark SQL, ML, >>>>>>>>> GraphFrames, Structured Streaming >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Jun 7, 2017 at 9:40 PM vaquar khan <vaquar.k...@gmail.com> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> +1 non-binding >>>>>>>>>> >>>>>>>>>> Regards, >>>>>>>>>> vaquar khan >>>>>>>>>> >>>>>>>>>> On Jun 7, 2017 4:32 PM, "Ricardo Almeida" < >>>>>>>>>> ricardo.alme...@actnowib.com> wrote: >>>>>>>>>> >>>>>>>>>> +1 (non-binding) >>>>>>>>>> >>>>>>>>>> Built and tested with -Phadoop-2.7 -Dhadoop.version=2.7.3 -Pyarn >>>>>>>>>> -Phive -Phive-thriftserver -Pscala-2.11 on >>>>>>>>>> >>>>>>>>>> - Ubuntu 17.04, Java 8 (OpenJDK 1.8.0_111) >>>>>>>>>> - macOS 10.12.5 Java 8 (build 1.8.0_131) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 5 June 2017 at 21:14, Michael Armbrust <mich...@databricks.com >>>>>>>>>> > wrote: >>>>>>>>>> >>>>>>>>>>> Please vote on releasing the following candidate as Apache >>>>>>>>>>> Spark version 2.2.0. The vote is open until Thurs, June 8th, >>>>>>>>>>> 2017 at 12:00 PST and passes if a majority of at least 3 +1 PMC >>>>>>>>>>> votes are cast. >>>>>>>>>>> >>>>>>>>>>> [ ] +1 Release this package as Apache Spark 2.2.0 >>>>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>>>> http://spark.apache.org/ >>>>>>>>>>> >>>>>>>>>>> The tag to be voted on is v2.2.0-rc4 >>>>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc4> ( >>>>>>>>>>> 377cfa8ac7ff7a8a6a6d273182e18ea7dc25ce7e) >>>>>>>>>>> >>>>>>>>>>> List of JIRA tickets resolved can be found with this filter >>>>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.2.0> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>>>> found at: >>>>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0- >>>>>>>>>>> rc4-bin/ >>>>>>>>>>> >>>>>>>>>>> Release artifacts are signed with the following key: >>>>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>>>>>> >>>>>>>>>>> The staging repository for this release can be found at: >>>>>>>>>>> https://repository.apache.org/content/repositories/orgapache >>>>>>>>>>> spark-1241/ >>>>>>>>>>> >>>>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2. >>>>>>>>>>> 0-rc4-docs/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> *FAQ* >>>>>>>>>>> >>>>>>>>>>> *How can I help test this release?* >>>>>>>>>>> >>>>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>>>> taking an existing Spark workload and running on this release >>>>>>>>>>> candidate, >>>>>>>>>>> then reporting any regressions. >>>>>>>>>>> >>>>>>>>>>> *What should happen to JIRA tickets still targeting 2.2.0?* >>>>>>>>>>> >>>>>>>>>>> Committers should look at those and triage. Extremely important >>>>>>>>>>> bug fixes, documentation, and API tweaks that impact compatibility >>>>>>>>>>> should >>>>>>>>>>> be worked on immediately. Everything else please retarget to 2.3.0 >>>>>>>>>>> or 2.2.1. >>>>>>>>>>> >>>>>>>>>>> *But my bug isn't fixed!??!* >>>>>>>>>>> >>>>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>>>> release unless the bug in question is a regression from 2.1.1. >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> >>>>>> Joseph Bradley >>>>>> >>>>>> Software Engineer - Machine Learning >>>>>> >>>>>> Databricks, Inc. >>>>>> >>>>>> [image: http://databricks.com] <http://databricks.com/> >>>>>> >>>>> >>>>> >>>> >>>> >>>> >>> >>> >> >