date:20180822

How to deal with context dependent computing?

2018-08-22 Thread JF Chen

For example, I have some data with timstamp marked as category A and B, and ordered by time. Now I want to calculate each duration from A to B. In normal program, I can use the flag bit to record the preview data if it is A or B, and then calculate the duration. But in Spark Dataframe, how to do

About the question of Spark Structured Streaming window output

2018-08-22 Thread z...@zjdex.com

Hi : I have some questions about spark structured streaming window output in spark 2.3.1. I write the application code as following: case class DataType(time:Timestamp, value:Long) {} val spark = SparkSession .builder .appName("StructuredNetworkWordCount")

Re: How to merge multiple rows

2018-08-22 Thread Patrick McCarthy

You didn't specify which API, but in pyspark you could do import pyspark.sql.functions as F df.groupBy('ID').agg(F.sort_array(F.collect_set('DETAILS')).alias('DETAILS')).show() +---++ | ID| DETAILS| +---++ | 1|[A1, A2, A3]| | 3|[B2]| | 2|[B1]|

Re: How to merge multiple rows

2018-08-22 Thread Jean Georges Perrin

How do you do it now? You could use a withColumn(“newDetails”, ) jg > On Aug 22, 2018, at 16:04, msbreuer wrote: > > A dataframe with following contents is given: > > ID PART DETAILS > 11 A1 > 12 A2 > 13 A3 > 21 B1 > 31 C1 > > Target format should be as following: >

How to merge multiple rows

2018-08-22 Thread msbreuer

A dataframe with following contents is given: ID PART DETAILS 11 A1 12 A2 13 A3 21 B1 31 C1 Target format should be as following: ID DETAILS 1 A1+A2+A3 2 B1 3 C1 Note, the order of A1-3 is important. Currently I am using this alternative: ID DETAIL_1 DETAIL_2

Re: No space left on device

2018-08-22 Thread Gourav Sengupta

Hi, that was just one of the options, and not the first one, is there any chance of trying out the other options mentioned? For example, pointing the shuffle storage area to a location with larger space? Regards, Gourav Sengupta On Wed, Aug 22, 2018 at 11:15 AM Vitaliy Pisarev <

Re: No space left on device

2018-08-22 Thread Vitaliy Pisarev

Documentation says that 'spark.shuffle.memoryFraction' was deprecated, but it doesn't say what to use instead. Any idea? On Wed, Aug 22, 2018 at 9:36 AM, Gourav Sengupta wrote: > Hi, > > The best part about Spark is that it is showing you which configuration to > tweak as well. In case you are

: Failed to create file system watcher service: User limit of inotify instances reached or too many open files

2018-08-22 Thread Polisetti, Venkata Siva Rama Gopala Krishna

Hi, When I am doing calculations for example 700 listID's it is saving only some 50 rows and then getting some random exceptions Getting below exception when I try to do calculations on huge data and try to save huge data . Please let me know if any suggestions. Sample Code : I have some

Re: No space left on device

2018-08-22 Thread Gourav Sengupta

Hi, The best part about Spark is that it is showing you which configuration to tweak as well. In case you are using EMR, try to see that the configuration points to the right location in the cluster "spark.local.dir". If a disk is mounted across all the systems with a common path (you can do that

How to deal with context dependent computing?

About the question of Spark Structured Streaming window output

Re: How to merge multiple rows

Re: How to merge multiple rows

How to merge multiple rows

Re: No space left on device

Re: No space left on device

: Failed to create file system watcher service: User limit of inotify instances reached or too many open files

Re: No space left on device

9 matches

Site Navigation

Mail list logo

Footer information