Re: Sorting tuples with byte key and byte value

2019-07-15 Thread Keith Chapman
Hi Supun, A couple of things with regard to your question. --executor-cores means the number of worker threads per VM. According to your requirement this should be set to 8. *repartitionAndSortWithinPartitions *is a RDD operation, RDD operations in Spark are not performant both in terms of

Re: Release Apache Spark 2.4.4 before 3.0.0

2019-07-15 Thread Dongjoon Hyun
Hi, Apache Spark PMC members. Can we cut Apache Spark 2.4.4 next Monday (22nd July)? Bests, Dongjoon. On Fri, Jul 12, 2019 at 3:18 PM Dongjoon Hyun wrote: > Thank you, Jacek. > > BTW, I added `@private` since we need PMC's help to make an Apache Spark > release. > > Can I get more feedbacks

Spark 2.4 scala 2.12 Regular Expressions Approach

2019-07-15 Thread anbutech
Hi All, Could you please help me to fix the below issue using spark 2.4 , scala 2.12 How do we extract's the multiple values in the given file name pattern using spark/scala regular expression.please give me some idea on the below approach. object Driver { private val filePattern =

Sorting tuples with byte key and byte value

2019-07-15 Thread Supun Kamburugamuve
Hi all, We are trying to measure the sorting performance of Spark. We have a 16 node cluster with 48 cores and 256GB of ram in each machine and 10Gbps network. Let's say we are running with 128 parallel tasks and each partition generates about 1GB of data (total 128GB). We are using the method

[PySpark] [SparkR] Is it possible to invoke a PySpark function with a SparkR DataFrame?

2019-07-15 Thread Fiske, Danny
Hi all, Forgive this naïveté, I'm looking for reassurance from some experts! In the past we created a tailored Spark library for our organisation, implementing Spark functions in Scala with Python and R "wrappers" on top, but the focus on Scala has alienated our analysts/statisticians/data