Re: Using bundler for Jekyll?

2021-02-12 Thread Sean Owen
Seems fine to me. How about just regenerating the whole site once with the latest version and requiring that? On Fri, Feb 12, 2021 at 7:09 AM attilapiros wrote: > I run into the same problem today and tried to find the version where the > diff is minimal, so I wrote a script: > > ``` > #!/bin/zs

Re: Apache Spark 3.0.2 Release ?

2021-02-12 Thread Sean Owen
Sounds like a fine time to me, sure. On Fri, Feb 12, 2021 at 1:39 PM Dongjoon Hyun wrote: > Hi, All. > > As of today, `branch-3.0` has 307 patches (including 25 correctness > patches) since v3.0.1 tag (released on September 8th, 2020). > > Since we stabilized branch-3.0 during 3.1.x preparation

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-17 Thread Sean Owen
I think I'm +1 on this, in that I don't see any more test failures than I usually do, and I think they're due to my local env, but is anyone seeing these failures? - includes jars passed in through --jars *** FAILED *** Process returned with exit code 1. See the log4j logs for more detail. (Spark

Re: [VOTE] Release Spark 3.0.2 (RC1)

2021-02-17 Thread Sean Owen
M Dongjoon Hyun wrote: > I didn't see them. Could you describe your environment: OS, Java, > Maven/SBT, profiles? > > On Wed, Feb 17, 2021 at 6:26 PM Sean Owen wrote: > >> I think I'm +1 on this, in that I don't see any more test failures than I >> usually

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Sean Owen
I think it's OK to raise particular instances. It's hard for me to evaluate further in the abstract. I don't think we use Assignee much at all, except to kinda give credit when something is done. No piece of code or work can be solely owned by one person; this is just ASF policy. I think we've se

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-18 Thread Sean Owen
Holden is absolutely correct - pinging relevant individuals is probably your best bet. I skim the 40-50 PRs that have activity each day and look into a few that look like I would know something about by the title, but, easy to miss something I could weigh in on. There is no way to force people to

Re: [DISCUSS] assignee practice on committers+ (possible issue on preemption)

2021-02-18 Thread Sean Owen
hnically it's > simply possible that someone can file JIRA issues on his/her backlog which > can be done in a couple of months or so with assigning to him/herself, > which effectively blocks others from working or proposing the same. I > consider this as preemptive which sounds bad and eve

Re: Java Code Style

2021-02-20 Thread Sean Owen
Do you just mean you want to adjust the code style rules? Yes you can do that in IJ, just a matter of finding the indent rule to adjust. The Spark style is pretty normal stuff, though not 100% consistent.I prefer the first style in this case. Sometimes it's a matter of judgment when to differ from

Re: [VOTE] Release Spark 3.1.1 (RC3)

2021-02-22 Thread Sean Owen
+1 LGTM, same results as last time. Does anyone see the error below? It is probably env-specific as the Jenkins jobs don't hit this. Just checking. SPARK-29604 external listeners should be initialized with Spark classloader *** FAILED *** java.lang.RuntimeException: [download failed: tomcat#jas

Re: Auto-closing PRs or How to get reviewers' attention

2021-02-23 Thread Sean Owen
ue, Feb 23, 2021 at 4:06 AM Enrico Minack wrote: > Am 18.02.21 um 16:34 schrieb Sean Owen: > > One other aspect is that a committer is taking some degree of > > responsibility for merging a change, so the ask is more than just a > > few minutes of eyeballing. If it bre

Re: K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Sean Owen
Shane would you know? May be a problem with a single worker. On Tue, Feb 23, 2021 at 8:46 AM Phillip Henry wrote: > > Hi, > > Silly question: the Jenkins build for my PR is failing but it seems > outside of my control. What must I do to remedy this? > > I've submitted > > https://github.com/apac

Re: Apache Spark 3.2 Expectation

2021-02-25 Thread Sean Owen
I'd roughly expect 3.2 in, say, July of this year, given the usual cadence. No reason it couldn't be a little sooner or later. There is already some good stuff in 3.2 and will be a good minor release in 5-6 months. On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun wrote: > Hi, All. > > Since we hav

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
For reference, 2.3.x was maintained from February 2018 (2.3.0) to Sep 2019 (2.3.4), or about 19 months. The 2.4 branch should probably be maintained longer than that, as the final 2.x branch. 2.4.0 was released in Nov 2018. A final release in, say, April 2021 would be about 30 months. That feels ab

Re: Apache Spark 2.4.8 (and EOL of 2.4)

2021-03-03 Thread Sean Owen
. > > Ya, exactly, we can release 2.4.8 as a normal release first and use 2.4.9 > as the EOL release. > > Since 2.4.7 was released almost 6 months ago, 2.4.8 is a little late in > terms of the cadence. > > Bests, > Dongjoon. > > > On Wed, Mar 3, 2021 at 10:55 AM Sea

Re: [DISCUSS] Support pandas API layer on PySpark

2021-03-14 Thread Sean Owen
I like koalas a lot. Playing devil's advocate, why not just let it continue to live as an add on? Usually the argument is it'll be maintained better in Spark but it's well maintained. It adds some overhead to maintaining Spark conversely. On the upside it makes it a little more discoverable. Are th

Re: [VOTE] Release Spark 2.4.8 (RC1)

2021-04-07 Thread Sean Owen
Looks good to me testing on Java 8, Hadoop 2.7, Ubuntu, with about all profiles enabled. I still get an odd failure in the Hive versions suite, but I keep seeing that in my env and think it's something odd about my setup. +1

Re: [VOTE] Release Spark 2.4.8 (RC2)

2021-04-12 Thread Sean Owen
+1 same result as last RC for me. On Mon, Apr 12, 2021, 12:53 AM Liang-Chi Hsieh wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until Apr 15th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 vote

Re: mvn auto-downloading on fresh clone

2021-04-21 Thread Sean Owen
I agree, it looks like the automatic redirector has changed behavior. It still sends you to an HTML page for the mirror, but previously that link would cause it to redirect straight to the download. While the script can fallback to archive.apache.org, it doesn't because the HTML downloads successfu

Re: [VOTE] Release Spark 2.4.8 (RC3)

2021-04-28 Thread Sean Owen
+1 from me too, same result as last time. On Wed, Apr 28, 2021 at 11:33 AM Liang-Chi Hsieh wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.8. > > The vote is open until May 4th at 9AM PST and passes if a majority +1 PMC > votes are cast, with a minimum of

Re: Should we add built in support for bouncy castle EC w/Kube

2021-04-29 Thread Sean Owen
I recall that Bouncy Castle has some crypto export implications. If it's in the distro then I think we'd have to update https://www.apache.org/licenses/exports/ to reflect that Bouncy Castle is again included in the product. But that's doable. Just have to recall how one updates that. On Thu, Apr

Re: [apache/spark-website] Update contributing to include code of conduct section (#335)

2021-05-04 Thread Sean Owen
Just FYI - proposed update to the CoC for the project. Looks reasonable to simply adopt the ASF code of conduct, per the PR. On Tue, May 4, 2021 at 2:02 AM Jungtaek Lim wrote: > I think the rationalization is great, but why not going through dev@ > mailing list? Many contributors are subscribing

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-10 Thread Sean Owen
It looks like the repository is "open" - it doesn't publish until "closed" after all artifacts are uploaded. Is that it? Otherwise +1 from me. On Mon, May 10, 2021 at 1:10 AM Liang-Chi Hsieh wrote: > Yea, I don't know why it happens. > > I remember RC1 also has the same issue. But RC2 and RC3 do

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
Hm, yes I see it at http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee&fingerprint=on&op=index but not on keyserver.ubuntu.com for some reason. What happens if you try to close it again, perhaps even manually in the UI there? I don't want to click it unless it messes up the workflo

Re: [VOTE] Release Spark 2.4.8 (RC4)

2021-05-11 Thread Sean Owen
be leave as it > is? > > > Sean Owen-2 wrote > > Hm, yes I see it at > > > http://pool.sks-keyservers.net/pks/lookup?search=0x653c2301fea493ee&fingerprint=on&op=index > > but not on keyserver.ubuntu.com for some reason. > > What happens if you try to cl

Re: Resolves too old JIRAs as incomplete

2021-05-19 Thread Sean Owen
I agree. Such old JIRAs are 99% obsolete. If anyone objects to a particular issue being closed, they can comment and we can reopen. It's a very reversible thing. There is value in keeping JIRA up to date with reality. On Wed, May 19, 2021 at 8:47 PM Takeshi Yamamuro wrote: > Hi, dev, > > As you

Re: [VOTE] Release Spark 3.1.2 (RC1)

2021-05-25 Thread Sean Owen
+1 same result as in previous tests On Mon, May 24, 2021 at 1:14 AM Dongjoon Hyun wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.2. > > The vote is open until May 27th 1AM (PST) and passes if a majority +1 PMC > votes are cast, with a minimum of 3 +1 vote

How to think about SparkPullRequestBuilder-K8s?

2021-06-11 Thread Sean Owen
I find that somewhat often, the K8S PR builders will fail on a PR: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/ ... when the PR seems totally unrelated to K8S. I've kind of learned to ignore them in that case but that seems wrong. Are they just kind of flaky? am I imagin

Re: [VOTE] Release Spark 3.0.3 (RC1)

2021-06-17 Thread Sean Owen
+1 same result as ever. Signatures are OK, tags look good, tests pass. On Thu, Jun 17, 2021 at 5:11 AM Yi Wu wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.0.3. > > The vote is open until Jun 21th 3AM (PST) and passes if a majority +1 PMC > votes are cast,

Re: [DISCUSS] Rename hadoop-3.2/hadoop-2.7 profile to hadoop-3/hadoop-2?

2021-06-24 Thread Sean Owen
The downside here is that it would break downstream builds that set hadoop-3.2 if it's now called hadoop-3. That's not a huge deal. We can retain dummy profiles under the old names that do nothing, but that would be a quieter 'break'. I suppose this naming is only of importance to developers, who m

Re: Removing references to Master

2021-07-09 Thread Sean Owen
We maybe don't need to litigate this one again. I do think this point of view is legitimate, as is the point of view that 'master' is inextricably linked to 'master/slave' as an unfortunate term of art; it did not originate in reference to mastery of a skill but of another entity. Even if one viewe

Re: Spark 3: Resource Discovery

2021-07-17 Thread Sean Owen
At the moment this is really about discovering GPUs, so that the scheduler can schedule tasks that need to allocate whole GPUs. On Sat, Jul 17, 2021 at 5:14 PM ayan guha wrote: > Hi > > As I was going through Spark 3 config params, I noticed following group of > params. I could not understand wh

Re: TreeNode.exists?

2021-08-11 Thread Sean Owen
If this is repeated a bunch of places in the code, sure, a utility method could be good. I think .find(x).isDefined is even not optimal - .exists(x) is a little easier and may be slightly faster? If you find a chance for refactoring, sure open a minor PR. On Wed, Aug 11, 2021 at 9:42 AM Jacek Lask

Re: Access to Apache GitHub

2021-08-15 Thread Sean Owen
No, we can't give write access to Apache repos of course, not to anyone but committers. People contribute by opening pull requests. On Sun, Aug 15, 2021 at 10:11 AM Mich Talebzadeh wrote: > > Hi, > > > With reference to recent threads/discussions on creating ready-made > docker images for spar

Re: Nabble archive is down

2021-08-17 Thread Sean Owen
If the links are down and not evidently coming back, yeah let's change any website links. Probably best to depend on ASF resources foremost, but, the ASF archive isn't searchable: https://mail-archives.apache.org/mod_mbox/spark-user/ What about things like https://www.mail-archive.com/user@spark.a

Re: Nabble archive is down

2021-08-17 Thread Sean Owen
Oh duh, right, much better idea! On Tue, Aug 17, 2021 at 2:56 PM Micah Kornfield wrote: > https://lists.apache.org/list.html?u...@spark.apache.org should be > searchable (although the UI is a little clunky). > > On Tue, Aug 17, 2021 at 12:52 PM Sean Owen wrote: > >> If t

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
So far, I've tested Java 8 + Scala 2.12, Scala 2.13 and the results look good per usual. Good to see Scala 2.13 artifacts!! Unless I've forgotten something we're OK for Scala 2.13 now, and Java 11 (and, IIRC, Java 14 works fine minus some very minor corners of the project's deps) I think we're goi

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-22 Thread Sean Owen
f(ADAMContext.scala:2686) > at > org.bdgenomics.adam.ds.ADAMContext.loadVariants(ADAMContext.scala:3608) > at > org.bdgenomics.adam.ds.variant.VariantDatasetSuite.$anonfun$new$1(VariantDatasetSuite.scala:128) > at > org.bdgenomics.utils.misc.SparkFunSuite.$anonfun$sparkTest$1(S

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-24 Thread Sean Owen
I think we'll need this revert: https://github.com/apache/spark/pull/33819 Between that and a few other minor but important issues I think I'd say -1 myself and ask for another RC. On Tue, Aug 24, 2021 at 1:01 PM Jacek Laskowski wrote: > Hi Yi Wu, > > Looks like the issue has got resolution: Wo

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
Did you run ./dev/change-scala-version.sh 2.13 ? that's required first to update POMs. It works fine for me. On Thu, Aug 26, 2021 at 8:33 PM Stephen Coy wrote: > Hi all, > > Being adventurous I have built the RC1 code with: > > -Pyarn -Phadoop-3.2 -Pyarn -Phadoop-cloud -Phive-thriftserver -Phiv

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-26 Thread Sean Owen
ala-2.13 > > > org.scala-lang.modules > > scala-parallel-collections_${scala.binary.version} > > > > > which means this dependency will be missing for unit tests that create > SparkSessions from library code only, a technique inspired by Spark’s

Re: [VOTE] Release Spark 3.2.0 (RC1)

2021-08-27 Thread Sean Owen
y wrote: > Hi Sean, > > I think that maybe the https://www.mojohaus.org/flatten-maven-plugin/ will > help you out here. > > Cheers, > > Steve C > > On 27 Aug 2021, at 12:29 pm, Sean Owen wrote: > > OK right, you would have seen a different error otherwise. > &g

Re: Question using multiple partition for Window cumulative functions when partition is not specified.

2021-08-30 Thread Sean Owen
You just have 1 partition here because the input is so small. You can always repartition this further for parallelism. Is the issue that you're not partitioning the window itself, maybe? On Mon, Aug 30, 2021 at 12:59 AM Haejoon Lee wrote: > Hi all, > > I noticed that Spark uses only one partitio

Re: [VOTE] Release Spark 3.2.0 (RC2)

2021-09-01 Thread Sean Owen
This RC looks OK to me too, understanding we may need to have RC3 for the outstanding issues though. The issue with the Scala 2.13 POM is still there; I wasn't able to figure it out (anyone?), though it may not affect 'normal' usage (and is work-around-able in other uses, it seems), so may be suff

Re: [SQL][AQE] Advice needed: a trivial code change with a huge reading impact?

2021-09-08 Thread Sean Owen
That does seem pointless. The body could just be .flatten()-ed to achieve the same result. Maybe it was just written that way for symmetry with the block above. You could open a PR to change it. On Wed, Sep 8, 2021 at 4:31 AM Jacek Laskowski wrote: > Hi Spark Devs, > > I'm curious what your take

Re: Adding Spark 4 to JIRA for targetted versions

2021-09-13 Thread Sean Owen
Sure, doesn't hurt to have a placeholder. On Mon, Sep 13, 2021, 5:32 PM Holden Karau wrote: > Hi Folks, > > I'm going through the Spark 3.2 tickets just to make sure were not missing > anything important and I was wondering what folks thoughts are on adding > Spark 4 so we can target API breakin

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-20 Thread Sean Owen
+1 from me, same results as the last RC from my side. The Scala 2.13 POM issue was resolved and the 2.13 build appears to be OK. On Sat, Sep 18, 2021 at 10:19 PM Gengliang Wang wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.2.0. > > The vote is open until 1

Re: [VOTE] Release Spark 3.2.0 (RC3)

2021-09-21 Thread Sean Owen
Hm yeah I tend to agree. See https://github.com/apache/spark/pull/33912 This _is_ a test-only dependency which makes it less of an issue. I'm guessing it's not in Maven as it's a small one-off utility; we _could_ just inline the ~100 lines of code in test code instead? On Tue, Sep 21, 2021 at 12:3

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Hm... it does just affect Mac OS (?) and only if you don't have JAVA_HOME set (which people often do set) and only affects build/mvn, vs built-in maven (which people often have installed). Only affects those building. I'm on the fence about whether it blocks 3.2.0, as it doesn't affect downstream u

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Has anyone seen a StackOverflowError when running tests? It happens in compilation. I heard from another user who hit this earlier, and I had not, until just today testing this: [ERROR] ## Exception when compiling 495 sources to /mnt/data/testing/spark-3.2.0/sql/catalyst/target/scala-2.12/classes

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
Another "is anyone else seeing this"? in compiling common/yarn-network: [ERROR] [Error] /mnt/data/testing/spark-3.2.0/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java:32: package com.google.common.annotations does not exist [ERROR] [Error] /mnt/data/testing/s

Re: [VOTE] Release Spark 3.2.0 (RC5)

2021-09-27 Thread Sean Owen
n 'test'. On Mon, Sep 27, 2021 at 6:58 PM Chao Sun wrote: > Hmm it may be related to the commit. Sean: how do I reproduce this? > > On Mon, Sep 27, 2021 at 4:56 PM Sean Owen wrote: > >> Another "is anyone else seeing this"? in compiling common/yarn-network

Re: [VOTE] Release Spark 3.2.0 (RC6)

2021-09-29 Thread Sean Owen
+1 looks good to me as before, now that a few recent issues are resolved. On Tue, Sep 28, 2021 at 10:45 AM Gengliang Wang wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.2.0. > > The vote is open until 11:59pm Pacific time September 30 and passes if a > maj

Re: [VOTE] Release Spark 3.2.0 (RC7)

2021-10-07 Thread Sean Owen
+1 again. Looks good in Scala 2.12, 2.13, and in Java 11. I note that the mem requirements for Java 11 tests seem to need to be increased but we're handling that separately. It doesn't really affect users. On Wed, Oct 6, 2021 at 11:49 AM Gengliang Wang wrote: > Please vote on releasing the follo

Re: [VOTE][RESULT] Release Spark 3.2.0 (RC7)

2021-10-17 Thread Sean Owen
any monetary damages >>> arising from such loss, damage or destruction. >>> >>> >>> >>> >>> >>> >>> >>> On Tue, 12 Oct 2021 at 08:15, Gengliang Wang wrote: >>> >>> The vote passes with 28 +1s (10

Re: Update Spark 3.3 release window?

2021-10-27 Thread Sean Owen
Seems fine to me - as good a placeholder as anything. Would that be about time to call 2.x end-of-life? On Wed, Oct 27, 2021 at 9:36 PM Hyukjin Kwon wrote: > Hi all, > > Spark 3.2. is out. Shall we update the release window > https://spark.apache.org/versioning-policy.html? > I am thinking of Mi

Re: Jira components cleanup

2021-11-15 Thread Sean Owen
Done. Now let's see if that generated 86 update emails! On Mon, Nov 15, 2021 at 11:03 AM Nicholas Chammas < nicholas.cham...@gmail.com> wrote: > > https://issues.apache.org/jira/projects/SPARK?selectedItem=com.atlassian.jira.jira-projects-plugin:components-page > > I think the "docs" component sh

Re: Scala 3 support approach

2021-12-03 Thread Sean Owen
fter that, we >> will see what is needed for Scala 3. >> >> Bests, >> Dongjoon. >> >> On Sun, Oct 18, 2020 at 1:33 PM Koert Kuipers wrote: >> >>> i think scala 3.0 will be able to use libraries built with Scala 2.13 >>> (as long as t

Re: Time for Spark 3.2.1?

2021-12-06 Thread Sean Owen
Always fine by me if someone wants to roll a release. It's been ~6 months since the last 3.0.x and 3.1.x releases, too; a new release of those wouldn't hurt either, if any of our release managers have the time or inclination. 3.0.x is reaching unofficial end-of-life around now anyway. On Mon, De

Re: Log4j 1.2.17 spark CVE

2021-12-12 Thread Sean Owen
Check the CVE - the log4j vulnerability appears to affect log4j 2, not 1.x. There was mention that it could affect 1.x when used with JNDI or SMS handlers, but Spark does neither. (unless anyone can think of something I'm missing, but never heard or seen that come up at all in 7 years in Spark) Th

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
10:02 AM Jörn Franke wrote: > Is it in any case appropriate to use log4j 1.x which is not maintained > anymore and has other security vulnerabilities which won’t be fixed anymore > ? > > Am 13.12.2021 um 06:06 schrieb Sean Owen : > >  > Check the CVE - the log4j vulnerabil

Re: Log4j 1.2.17 spark CVE

2021-12-13 Thread Sean Owen
2021 at 6:33 PM James Yu wrote: > Question: Spark use log4j 1.2.17, if my application jar contains log4j 2.x > and gets submitted to the Spark cluster. Which version of log4j gets > actually used during the Spark session? > -- > *From:* Sean Owen > *

Re: Log4j 1.2.17 spark CVE

2021-12-14 Thread Sean Owen
FWIW here is the Databricks statement on it. Not the same as Spark but includes Spark of course. https://databricks.com/blog/2021/12/13/log4j2-vulnerability-cve-2021-44228-research-and-assessment.html Yes the question is almost surely more whether user apps are affected, not Spark itself. On Tue

Re: [MISC] Should we add .github/FUNDING.yml

2021-12-15 Thread Sean Owen
It might imply that this is a way to fund Spark alone, and it isn't. Probably no big deal either way but maybe not worth it. It won't be a mystery how to find and fund the ASF for the few orgs that want to, as compared to a small project On Wed, Dec 15, 2021, 8:34 AM Maciej wrote: > Hi All, > >

Re: Creating a memory-efficient AggregateFunction to calculate Median

2021-12-15 Thread Sean Owen
Parquet or ORC have the necessary stats to make this fast too already, but only helps if you want the median of sorted data as stored on disk, rather than the general case. Not sure you can do better than roughly what a sort entails if you want the exact median On Wed, Dec 15, 2021, 8:56 AM Pol Sa

Re: spark jdbc

2021-12-17 Thread Sean Owen
I'm not sure we want to do that. If you "SELECT foo AS bar", then the column name is foo but the column label is bar. We probably want to return the latter. On Fri, Dec 17, 2021 at 9:07 AM Gary Liu wrote: > In spark sql jdbc module, it's using getColumnLabel to get column names > from the remote

Re: ivy unit test case filing for Spark

2021-12-21 Thread Sean Owen
You would have to make it available? This doesn't seem like a spark issue. On Tue, Dec 21, 2021, 10:48 AM Pralabh Kumar wrote: > Hi Spark Team > > I am building a spark in VPN . But the unit test case below is failing. > This is pointing to ivy location which cannot be reached within VPN . Any

Re: About contribution

2022-01-05 Thread Sean Owen
(There is no project chat) See https://spark.apache.org/contributing.html On Tue, Jan 4, 2022 at 11:42 PM Dennis Jung wrote: > Hello, I hope this is not a silly question. > (I couldn't find any chat room on spark project, so asking on mail) > > It has been about a year since using spark in work,

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Sean Owen
+1 looks good to me. I ran all tests with scala 2.12 and 2.13 and had the same results as 3.2.0 testing. On Mon, Jan 10, 2022 at 12:10 PM huaxin gao wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.2.1. > > The vote is open until Jan. 13th at 12 PM PST (8 PM

Re: Spark on Oracle available as an Apache licensed open source repo

2022-01-13 Thread Sean Owen
-user Thank you for this, but just a small but important point about the use of the Spark name. Please take a look at https://spark.apache.org/trademarks.html Specifically, this should reference "Apache Spark" at least once prominently with a link to the project. It's also advisable to avoid using

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
(Are you suggesting this is a regression, or is it a general question? here we're trying to figure out whether there are critical bugs introduced in 3.2.1 vs 3.2.0) On Fri, Jan 21, 2022 at 1:58 PM Bjørn Jørgensen wrote: > Hi, I am wondering if it's a bug or not. > > I do have a lot of json files

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
e.org/jira/browse/SPARK-37981 for this > bug. > > > > > fre. 21. jan. 2022 kl. 21:45 skrev Sean Owen : > >> (Are you suggesting this is a regression, or is it a general question? >> here we're trying to figure out whether there are critical bugs introduced >

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
t 5:24 PM Bjørn Jørgensen wrote: > Ok, but deleting users' data without them knowing it is never a good idea. > That's why I give this RC -1. > > lør. 22. jan. 2022 kl. 00:16 skrev Sean Owen : > >> (Bjorn - unless this is a regression, it would not block a release, eve

Re: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-21 Thread Sean Owen
+1 with same result as last time. On Thu, Jan 20, 2022 at 9:59 PM huaxin gao wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.2.1. The vote is open until 8:00pm Pacific time January 25 and passes if > a majority +1 PMC votes are cast, with a minimum of 3 +1 v

Re: Log likelhood in GeneralizedLinearRegression

2022-01-22 Thread Sean Owen
This exists in the evaluator MulticlassClassificationEvaluator instead (which can be used for binary), does that work? On Sat, Jan 22, 2022 at 4:36 AM Phillip Henry wrote: > Hi, > > As far as I know, there is no function to generate the log likelihood from > a GeneralizedLinearRegression model.

Re: Log4j upgrade in spark binary from 1.2.17 to 2.17.1

2022-01-31 Thread Sean Owen
(BTW you are sending to the Spark incubator list, and Spark has not been in incubation for about 7 years. Use u...@spark.apache.org) What update are you looking for? this has been discussed extensively on the Spark mailing list. Spark is not evidently vulnerable to this. 3.3.0 will include log4j 2

Re: [VOTE] Spark 3.1.3 RC3

2022-02-02 Thread Sean Owen
+1 from me, same result as the last release on my end. I think releasing 3.1.3 is fine, it's 7 months since 3.1.2. On Tue, Feb 1, 2022 at 7:12 PM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.3. > > The vote is open until Feb. 4th at 5 PM P

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
Yes I've seen this; the JVM stack size needs to be increased. I'm not sure if it's env specific (though you and I at least have hit it, I think others), or whether we need to change our build script. In the pom.xml file, find "-Xss..." settings and make them something like "-Xss4m", see if that wor

Re: Help needed to locate the csv parser (for Spark bug reporting/fixing)

2022-02-10 Thread Sean Owen
It starts in org.apache.spark.sql.execution.datasources.csv.CSVDataSource. Yes univocity is used for much of the parsing. I am not sure of the cause of the bug but it does look like one indeed. In one case the parser is asked to read all fields, in the other, to skip one. The pushdown helps efficie

Re: Problem building spark-catalyst_2.12 with Maven

2022-02-10 Thread Sean Owen
i Sean, > > On Thu, Feb 10, 2022 at 5:37 PM Sean Owen wrote: > >> Yes I've seen this; the JVM stack size needs to be increased. I'm not >> sure if it's env specific (though you and I at least have hit it, I think >> others), or whether we need to

Re: [VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Sean Owen
Looks good to me, same results as last RC, +1 On Mon, Feb 14, 2022 at 2:55 PM Holden Karau wrote: > Please vote on releasing the following candidate as Apache Spark version > 3.1.3. > > The vote is open until Feb. 18th at 1 PM pacific (9 PM GMT) and passes if > a majority > +1 PMC votes are cast

Re: Which manufacturers' GPUs support Spark?

2022-02-16 Thread Sean Owen
Spark itself does not use GPUs, and is agnostic to what GPUs exist on a cluster, scheduled by the resource manager, and used by an application. In practice, virtually all GPU-related use cases (for deep learning for example) use CUDA, and this is NVIDIA-specific. Certainly, RAPIDS is from NVIDIA.

Re: Apache Spark 3.3 Release

2022-03-03 Thread Sean Owen
I think it's fine to pursue the existing plan - code freeze in two weeks and try to close off key remaining issues. Final release pending on how those go, and testing, but fine to get the ball rolling. On Thu, Mar 3, 2022 at 12:45 PM Maxim Gekk wrote: > Hello All, > > I would like to bring on th

Re: bazel and external/

2022-03-17 Thread Sean Owen
Just checking - there is no way to tell bazel to look somewhere else for whatever 'external' means to it? It's a kinda big ugly change but it's not a functional change. If anything it might break some downstream builds that rely on the current structure too. But such is life for developers? I don't

Re: bazel and external/

2022-03-17 Thread Sean Owen
`external` has been baked in bazel since the >>> beginning and there is no plan from bazel devs to attempt to fix this >>> <https://github.com/bazelbuild/bazel/issues/4508#issuecomment-724055371> >>> . >>> >>> On Thu, Mar 17, 2022 at 7:52 PM Sean Ow

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-18 Thread Sean Owen
I think we can assume that someone upgrading Kafka will be responsible for thinking through the breaking changes. We can help by listing anything we know could affect Spark-Kafka usage and calling those out in a release note, for sure. I don't think we need to get into items that would affect Kafka

Re: [DISCUSS] Migration guide on upgrading Kafka to 3.1 in Spark 3.3

2022-03-23 Thread Sean Owen
ld refer to us, and then it is no longer a > matter of “help”. It is a matter of “responsibility”, as you said. > > 2022년 3월 18일 (금) 오후 10:15, Sean Owen 님이 작성: > >> I think we can assume that someone upgrading Kafka will be responsible >> for thinking through the breaking

Re: Tools for regression testing

2022-03-24 Thread Sean Owen
Hm, then what are you looking for besides all the tests in Spark? On Thu, Mar 24, 2022, 2:34 PM Mich Talebzadeh wrote: > Thanks > > I know what unit testing is. The question was not about unit testing. it > was specific to regression testing >

Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
I think this must be related to the Gitbox migration that just happened. It does seem like I'm getting more emails - some are on PRs I'm attached to, but some I don't recognize. The thing is, I'm not yet clear if they duplicate the normal Github emails - that is if we turn them off do we have anyth

Re: Deluge of GitBox emails

2022-04-04 Thread Sean Owen
to > comments on Jira. > > Turning off these GitBox emails should not have in impact on the usual > GitHub emails we are all already familiar with. > > > On Apr 4, 2022, at 9:47 AM, Sean Owen wrote: > > I think this must be related to the Gitbox migration that just happened. &g

Re: Spark 3.0.1 and spark 3.2 compatibility

2022-04-07 Thread Sean Owen
(Don't cross post please) Generally you definitely want to compile and test vs what you're running on. There shouldn't be many binary or source incompatibilities -- these are avoided in a major release where possible. So it may need no code change. But I would certainly recompile just on principle!

Re: CVE -2020-28458, How to upgrade datatables dependency

2022-04-13 Thread Sean Owen
You can see the files in core/src/main/resources/org/apache/spark/ui/static - you can try dropping in the new minified versions and see if the UI is OK. You can open a pull request if it works to update it, in case this affects Spark. It looks like the smaller upgrade to 1.10.22 is also sufficient.

Re: CVE-2021-38296: Apache Spark Key Negotiation Vulnerability - 2.4 Backport?

2022-04-14 Thread Sean Owen
It does affect 2.4.x, yes. 2.4.x was EOL a while ago, so there wouldn't be a new release of 2.4.x in any event. It's recommended to update instead, at least to 3.1.3. On Thu, Apr 14, 2022 at 12:07 PM Chris Nauroth wrote: > A fix for CVE-2021-38296 was committed and released in Apache Spark 3.1.3

Re: CVE -2020-28458, How to upgrade datatables dependency

2022-04-16 Thread Sean Owen
FWIW here's an update to 1.10.25: https://github.com/apache/spark/pull/36226 On Wed, Apr 13, 2022 at 8:28 AM Sean Owen wrote: > You can see the files in > core/src/main/resources/org/apache/spark/ui/static - you can try dropping > in the new minified versions and see if the UI is

Re: CVE-2021-22569

2022-05-04 Thread Sean Owen
Sure, did you search the JIRA? https://issues.apache.org/jira/browse/SPARK-38340 Does this affect Spark's usage of protobuf? Looks like it can't be updated to 3.x -- this is really not a dependency of Spark but underlying dependencies. Feel free to re-attempt a change that might work, at least wi

Re: CVE-2020-13936

2022-05-05 Thread Sean Owen
This is a Velocity issue. Spark doesn't use it, although it looks like Avro does. From reading the CVE, I do not believe it would impact Avro's usage - velocity templates it may use for codegen aren't exposed that I know of. Is there a known relationship to Spark here? That is the key question in s

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-05 Thread Sean Owen
I'm seeing test failures; is anyone seeing ones like this? This is Java 8 / Scala 2.12 / Ubuntu 22.04: - SPARK-37618: Sub dirs are group writable when removing from shuffle service enabled *** FAILED *** [OWNER_WRITE, GROUP_READ, GROUP_WRITE, GROUP_EXECUTE, OTHERS_READ, OWNER_READ, OTHERS_EXECUT

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Sean Owen
RC1 is started, could you move them out from the 3.3.0 milestone? >>> Otherwise, we cannot distinguish new real blocker issues from those >>> obsolete JIRA issues. >>> >>> Thanks, >>> Dongjoon. >>> >>> >>> On Thu, May 5, 2022 at 1

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-16 Thread Sean Owen
I'm still seeing failures related to the function registry, like: ExpressionsSchemaSuite: - Check schemas for expression examples *** FAILED *** 396 did not equal 398 Expected 396 blocks in result file but got 398. Try regenerating the result files. (ExpressionsSchemaSuite.scala:161) - SPARK-14

Re: [VOTE] Release Spark 3.3.0 (RC3)

2022-05-25 Thread Sean Owen
+1 works for me as usual, with Java 8 + Scala 2.12, Java 11 + Scala 2.13. On Tue, May 24, 2022 at 12:14 PM Maxim Gekk wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.3.0. > > The vote is open until 11:59pm Pacific time May 27th and passes if a > majority +1

Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Sean Owen
In Scala 2.13, I'm getting errors like this: analyzer should replace current_timestamp with literals *** FAILED *** java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer cannot be cast to class scala.collection.immutable.Seq (scala.collection.mutable.ArrayBuffer and scala.col

  1   2   3   4   5   6   7   8   9   10   >