[jira] [Resolved] (SPARK-26826) Array indexing functions array_allpositions and array_select

2019-02-07 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic resolved SPARK-26826. --- Resolution: Won't Fix > Array indexing functions array_allpositions and array_sel

[jira] [Updated] (SPARK-26826) Array indexing functions array_allpositions and array_select

2019-02-05 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-26826: -- Description: This ticket proposes two extra array functions: {{array_allpositions}} (named

[jira] [Created] (SPARK-26826) Array indexing functions array_allpositions and array_select

2019-02-05 Thread Petar Zecevic (JIRA)
Petar Zecevic created SPARK-26826: - Summary: Array indexing functions array_allpositions and array_select Key: SPARK-26826 URL: https://issues.apache.org/jira/browse/SPARK-26826 Project: Spark

Re: Jenkins build errors

2018-06-29 Thread petar . zecevic
be a mirror > problem, throttling, etc. But there again haven't spotted another failing > Hive test. > > On Wed, Jun 20, 2018 at 1:55 AM Petar Zecevic wrote: > > It's still dying. Back to this error (it used to be spark-2.2.0 before): > > java.io.IOException: Cannot run progra

[jira] [Updated] (SPARK-24020) Sort-merge join inner range optimization

2018-06-28 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-24020: -- Description: The problem we are solving is the case where you have two big tables

[jira] [Updated] (SPARK-24020) Sort-merge join inner range optimization

2018-06-28 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-24020: -- Description: The problem we are solving is the case where you have two big tables

[jira] [Updated] (SPARK-24020) Sort-merge join inner range optimization

2018-06-28 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-24020: -- Description: The problem we are solving is the case where you have two big tables

[jira] [Updated] (SPARK-24020) Sort-merge join inner range optimization

2018-06-28 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-24020: -- Description: The problem we are solving is the case where you have two big tables

[jira] [Updated] (SPARK-24020) Sort-merge join inner range optimization

2018-06-28 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Petar Zecevic updated SPARK-24020: -- Attachment: SMJ-innerRange-PR24020-designDoc.pdf > Sort-merge join inner range optimizat

Re: Jenkins build errors

2018-06-20 Thread Petar Zecevic
un 19, 2018, 2:53 AM Petar Zecevic <mailto:petar.zece...@gmail.com>> wrote: Thanks, but unfortunately, it died again. Now at pyspark tests: =

Re: Jenkins build errors

2018-06-19 Thread Petar Zecevic
eeded): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/92038/ Test FAILed. Finished: FAILURE Le 6/18/2018 à 8:05 PM, shane knapp a écrit : i triggered another build against your PR, so let's see if this happens again or was a transient failure. https://amplab.cs.berkeley.edu/jenkins/jo

Jenkins build errors

2018-06-18 Thread Petar Zecevic
Hi, Jenkins build for my PR (https://github.com/apache/spark/pull/21109 ; https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/92023/testReport/org.apache.spark.sql.hive/HiveExternalCatalogVersionsSuite/_It_is_not_a_test_it_is_a_sbt_testing_SuiteSelector_/) keeps failing. First it

Re: Sort-merge join improvement

2018-05-22 Thread Petar Zecevic
, all the new code is well contained in separate classes (unless it was necessary to change existing ones). So I believe this is ready to be merged. Can some of the committers please take another look at this and accept the PR? Thank you, Petar Zecevic Le 5/15/2018 à 10:55 AM, Petar Zecevic

Re: Sort-merge join improvement

2018-05-15 Thread Petar Zecevic
-optimized SMJ. Merging this would help us tremendously and I believe this can be useful in other applications, too. Can you please review (https://github.com/apache/spark/pull/21109) and merge the patch? Thank you, Petar Zecevic Le 4/23/2018 à 6:28 PM, Petar Zecevic a écrit : Hi, the PR

Re: Sort-merge join improvement

2018-04-23 Thread Petar Zecevic
Hi, the PR tests completed successfully (https://github.com/apache/spark/pull/21109). Can you please review the patch and merge it upstream if you think it's OK? Thanks, Petar Le 4/18/2018 à 4:52 PM, Petar Zecevic a écrit : As instructed offline, I opened a JIRA for this: https

[jira] [Commented] (SPARK-24020) Sort-merge join inner range optimization

2018-04-19 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16444335#comment-16444335 ] Petar Zecevic commented on SPARK-24020: --- No, this implementation only applies to equi-joins

Re: Sort-merge join improvement

2018-04-18 Thread Petar Zecevic
As instructed offline, I opened a JIRA for this: https://issues.apache.org/jira/browse/SPARK-24020 I will create a pull request soon. Le 4/17/2018 à 6:21 PM, Petar Zecevic a écrit : Hello everybody We (at University of Zagreb and University of Washington) have implemented an optimization

[jira] [Created] (SPARK-24020) Sort-merge join inner range optimization

2018-04-18 Thread Petar Zecevic (JIRA)
Petar Zecevic created SPARK-24020: - Summary: Sort-merge join inner range optimization Key: SPARK-24020 URL: https://issues.apache.org/jira/browse/SPARK-24020 Project: Spark Issue Type

Sort-merge join improvement

2018-04-17 Thread Petar Zecevic
g the sort-merge join algorithm? 2. We believe there is a more general pattern here and that this could help in other similar situations where secondary sorting is available. Would you agree? 3. Would you like us to open a JIRA ticket and create a pull request? Thanks, Pet

[jira] [Commented] (SPARK-13313) Strongly connected components doesn't find all strongly connected components

2016-03-19 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15200028#comment-15200028 ] Petar Zecevic commented on SPARK-13313: --- Ok, thanks for reporting. I'll look

[jira] [Commented] (SPARK-13313) Strongly connected components doesn't find all strongly connected components

2016-02-14 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147007#comment-15147007 ] Petar Zecevic commented on SPARK-13313: --- No, I don't think it's got anything to do

[jira] [Commented] (SPARK-13313) Strongly connected components doesn't find all strongly connected components

2016-02-14 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146731#comment-15146731 ] Petar Zecevic commented on SPARK-13313: --- Yes, you need articles.tsv and links.tsv from this archive

[jira] [Created] (SPARK-13313) Strongly connected components doesn't find all strongly connected components

2016-02-14 Thread Petar Zecevic (JIRA)
Petar Zecevic created SPARK-13313: - Summary: Strongly connected components doesn't find all strongly connected components Key: SPARK-13313 URL: https://issues.apache.org/jira/browse/SPARK-13313

Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString,

Re: Spark - Eclipse IDE - Maven

2015-07-28 Thread Petar Zecevic
Sorry about self-promotion, but there's a really nice tutorial for setting up Eclipse for Spark in Spark in Action book: http://www.manning.com/bonaci/ On 27.7.2015. 10:22, Akhil Das wrote: You can follow this doc

Re: Spark - Eclipse IDE - Maven

2015-07-28 Thread Petar Zecevic
Sorry about self-promotion, but there's a really nice tutorial for setting up Eclipse for Spark in Spark in Action book: http://www.manning.com/bonaci/ On 24.7.2015. 7:26, Siva Reddy wrote: Hi All, I am trying to setup the Eclipse (LUNA) with Maven so that I create a maven projects

Re: Is spark suitable for real time query

2015-07-28 Thread Petar Zecevic
You can try out a few tricks employed by folks at Lynx Analytics... Daniel Darabos gave some details at Spark Summit: https://www.youtube.com/watch?v=zt1LdVj76LUindex=13list=PL-x35fyliRwhP52fwDqULJLOnqnrN5nDs On 22.7.2015. 17:00, Louis Hust wrote: My code like below: MapString,

Re: Fwd: Model weights of linear regression becomes abnormal values

2015-05-29 Thread Petar Zecevic
You probably need to scale the values in the data set so that they are all of comparable ranges and translate them so that their means get to 0. You can use pyspark.mllib.feature.StandardScaler(True, True) object for that. On 28.5.2015. 6:08, Maheshakya Wijewardena wrote: Hi, I'm trying

[jira] [Commented] (SPARK-6646) Spark 2.0: Rearchitecting Spark for Mobile Platforms

2015-04-01 Thread Petar Zecevic (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14390206#comment-14390206 ] Petar Zecevic commented on SPARK-6646: -- Good one :) Spark 2.0: Rearchitecting Spark

Re: How to configure SparkUI to use internal ec2 ip

2015-03-31 Thread Petar Zecevic
Did you try setting the SPARK_MASTER_IP parameter in spark-env.sh? On 31.3.2015. 19:19, Anny Chen wrote: Hi Akhil, I tried editing the /etc/hosts on the master and on the workers, and seems it is not working for me. I tried adding hostname internal-ip and it didn't work. I then tried

Re: Spark-submit and multiple files

2015-03-20 Thread Petar Zecevic
I tried your program in yarn-client mode and it worked with no exception. This is the command I used: spark-submit --master yarn-client --py-files work.py main.py (Spark 1.2.1) On 20.3.2015. 9:47, Guillaume Charhon wrote: Hi Davies, I am already using --py-files. The system does use the

Re: Hamburg Apache Spark Meetup

2015-02-25 Thread Petar Zecevic
Please add the Zagreb Meetup group, too. http://www.meetup.com/Apache-Spark-Zagreb-Meetup/ Thanks! On 18.2.2015. 19:46, Johan Beisser wrote: If you could also add the Hamburg Apache Spark Meetup, I'd appreciate it. http://www.meetup.com/Hamburg-Apache-Spark-Meetup/ On Tue, Feb 17, 2015 at

Re: Facing error while extending scala class with Product interface to overcome limit of 22 fields in spark-shell

2015-02-25 Thread Petar Zecevic
I believe your class needs to be defined as a case class (as I answered on SO).. On 25.2.2015. 5:15, anamika gupta wrote: Hi Akhil I guess it skipped my attention. I would definitely give it a try. While I would still like to know what is the issue with the way I have created schema?

Re: Accumulator in SparkUI for streaming

2015-02-24 Thread Petar Zecevic
Interesting. Accumulators are shown on Web UI if you are using the ordinary SparkContext (Spark 1.2). It just has to be named (and that's what you did). scala val acc = sc.accumulator(0, test accumulator) acc: org.apache.spark.Accumulator[Int] = 0 scala val rdd = sc.parallelize(1 to 1000)

Re: Posting to the list

2015-02-21 Thread Petar Zecevic
The message went through after all. Sorry for spamming. On 21.2.2015. 21:27, pzecevic wrote: Hi Spark users. Does anybody know what are the steps required to be able to post to this list by sending an email to user@spark.apache.org? I just sent a reply to Corey Nolet's mail Missing shuffle

Re: Missing shuffle files

2015-02-21 Thread Petar Zecevic
Could you try to turn on the external shuffle service? spark.shuffle.service.enable= true On 21.2.2015. 17:50, Corey Nolet wrote: I'm experiencing the same issue. Upon closer inspection I'm noticing that executors are being lost as well. Thing is, I can't figure out how they are dying. I'm

Re: Where can I find logs set inside RDD processing functions?

2015-02-06 Thread Petar Zecevic
You can enable YARN log aggregation (yarn.log-aggregation-enable to true) and execute command yarn logs -applicationId your_application_id after your application finishes. Or you can look at them directly in HDFS in /tmp/logs/user/logs/applicationid/hostname On 6.2.2015. 19:50, nitinkak001

Re: LeaseExpiredException while writing schemardd to hdfs

2015-02-05 Thread Petar Zecevic
Why don't you just map rdd's rows to lines and then call saveAsTextFile()? On 3.2.2015. 11:15, Hafiz Mujadid wrote: I want to write whole schemardd to single in hdfs but facing following exception

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Petar Zecevic
Ok, thanks for the clarifications. I didn't know this list has to remain as the only official list. Nabble is really not the best solution in the world, but we're stuck with it, I guess. That's it from me on this subject. Petar On 22.1.2015. 3:55, Nicholas Chammas wrote: I think a few

Re: Discourse: A proposed alternative to the Spark User list

2015-01-22 Thread Petar Zecevic
this mailing list into subproject-specific lists? That might also help tune in/out the subset of conversations of interest. On Jan 22, 2015 10:30 AM, Petar Zecevic petar.zece...@gmail.com mailto:petar.zece...@gmail.com wrote: Ok, thanks for the clarifications. I didn't know this list has