Re: saveAsTextFile at treeEnsembleModels.scala:447, took 2.513396 s Killed

2016-07-28 Thread Ascot Moss
Hi, Thanks for your reply. permissions (access) is not an issue in my case, it is because this issue only happened when the bigger input file was used to generate the model, i.e. with smaller input(s) all worked well. It seems to me that ".save" cannot save big file. Q1: Any idea about the

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
ler Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohamed, Thanks for your response. Data is available in worker nodes. But looking for something to write directly to local fs. Seems like it is not an option. Thanks, Sivakumar Bhavanari. On Mon, Feb 1, 2016 at 5

RE: saveAsTextFile is not writing to local fs

2016-02-01 Thread Mohammed Guller
Analytics with Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> From: Siva [mailto:sbhavan...@gmail.com] Sent: Friday, January 29, 2016 5:40 PM To: Mohammed Guller Cc: spark users Subject: Re: saveAsTextFile is not writing to local fs Hi Mohammed, Thanks fo

Re: saveAsTextFile is not writing to local fs

2016-02-01 Thread Siva
t; > > Mohammed > > Author: Big Data Analytics with Spark > <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/> > > > > *From:* Siva [mailto:sbhavan...@gmail.com] > *Sent:* Friday, January 29, 2016 5:40 PM > *To:* Mohammed Guller >

RE: saveAsTextFile is not writing to local fs

2016-01-29 Thread Mohammed Guller
Is it a multi-node cluster or you running Spark on a single machine? You can change Spark’s logging level to INFO or DEBUG to see what is going on. Mohammed Author: Big Data Analytics with Spark From: Siva

Re: saveAsTextFile is not writing to local fs

2016-01-29 Thread Siva
Hi Mohammed, Thanks for your quick response. I m submitting spark job to Yarn in "yarn-client" mode on a 6 node cluster. I ran the job by turning on DEBUG mode. I see the below exception, but this exception occurred after saveAsTextfile function is finished. 16/01/29 20:26:57 DEBUG HttpParser:

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ted Yu
bq. val dist = sc.parallelize(l) Following the above, can you call, e.g. count() on dist before saving ? Cheers On Fri, Oct 2, 2015 at 1:21 AM, jarias wrote: > Dear list, > > I'm experimenting a problem when trying to write any RDD to HDFS. I've > tried > with minimal

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Jacinto Arias
Yes printing the result with collect or take is working, actually this is a minimal example, but also when working with real data the actions are performed, and the resulting RDDs can be printed out without problem. The data is there and the operations are correct, they just cannot be written

Re: saveAsTextFile creates an empty folder in HDFS

2015-10-03 Thread Ajay Chander
Hi Jacin, If I was you, first thing that I would do is, write a sample java application to write data into hdfs and see if it's working fine. Meta data is being created in hdfs, that means, communication to namenode is working fine but not to datanodes since you don't see any data inside the

Re: saveAsTextFile() part- files are missing

2015-05-21 Thread Tomasz Fruboes
Hi, it looks you are writing to a local filesystem. Could you try writing to a location visible by all nodes (master and workers), e.g. nfs share? HTH, Tomasz W dniu 21.05.2015 o 17:16, rroxanaioana pisze: Hello! I just started with Spark. I have an application which counts words in a

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-16 Thread Ilya Ganelin
All - this issue showed up when I was tearing down a spark context and creating a new one. Often, I was unable to then write to HDFS due to this error. I subsequently switched to a different implementation where instead of tearing down and re initializing the spark context I'd instead submit a

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
Hey, Did you find any solution for this issue, we are seeing similar logs in our Data node logs. Appreciate any help. 2015-05-15 10:51:43,615 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: NttUpgradeDN1:50010:DataXceiver error processing WRITE_BLOCK operation src:

Re: SaveAsTextFile brings down data nodes with IO Exceptions

2015-05-15 Thread Puneet Kapoor
I am seeing this on hadoop 2.4.0 version. Thanks for your suggestions, i will try those and let you know if they help ! On Sat, May 16, 2015 at 1:57 AM, Steve Loughran ste...@hortonworks.com wrote: What version of Hadoop are you seeing this on? On 15 May 2015, at 20:03, Puneet Kapoor

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
Another thing - could it be a permission problem ? It creates all the directory structure (in red)/tmp/wordcount/ _temporary/0/_temporary/attempt_201505051439_0001_m_01_3/part-1 so I am guessing not. On Tue, May 5, 2015 at 7:27 PM, Sudarshan Murty njmu...@gmail.com wrote: You are

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread Sudarshan Murty
You are most probably right. I assumed others may have run into this. When I try to put the files in there, it creates a directory structure with the part-0 and part1 files but these files are of size 0 - no content. The client error and the server logs have the error message shown -

Re: saveAsTextFile() to save output of Spark program to HDFS

2015-05-05 Thread ayan guha
What happens when you try to put files to your hdfs from local filesystem? Looks like its a hdfs issue rather than spark thing. On 6 May 2015 05:04, Sudarshan njmu...@gmail.com wrote: I have searched all replies to this question not found an answer. I am running standalone Spark 1.3.1 and

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim On Apr 16, 2015, at 1:35 PM, Sean Owen so...@cloudera.com wrote: You can't, since that's how it's designed to work. Batches are saved in different files, which are really directories

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
The reason for this is as follows: 1. You are saving data on HDFS 2. HDFS as a cluster/server side Service has a Single Writer / Multiple Reader multithreading model 3. Hence each thread of execution in Spark has to write to a separate file in HDFS 4. Moreover the

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
Nop Sir, it is possible - check my reply earlier -Original Message- From: Sean Owen [mailto:so...@cloudera.com] Sent: Thursday, April 16, 2015 6:35 PM To: Vadim Bichutskiy Cc: user@spark.apache.org Subject: Re: saveAsTextFile You can't, since that's how it's designed to work. Batches

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
@spark.apache.org Subject: Re: saveAsTextFile Thanks Sean. I want to load each batch into Redshift. What's the best/most efficient way to do that? Vadim On Apr 16, 2015, at 1:35 PM, Sean Owen so...@cloudera.com wrote: You can't, since that's how it's designed to work. Batches are saved in different

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
Just copy the files? it shouldn't matter that much where they are as you can find them easily. Or consider somehow sending the batches of data straight into Redshift? no idea how that is done but I imagine it's doable. On Thu, Apr 16, 2015 at 6:38 PM, Vadim Bichutskiy vadim.bichuts...@gmail.com

RE: saveAsTextFile

2015-04-16 Thread Evo Eftimov
files and directories From: Vadim Bichutskiy [mailto:vadim.bichuts...@gmail.com] Sent: Thursday, April 16, 2015 6:45 PM To: Evo Eftimov Cc: user@spark.apache.org Subject: Re: saveAsTextFile Thanks Evo for your detailed explanation. On Apr 16, 2015, at 1:38 PM, Evo Eftimov evo.efti

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Copy should be doable but I'm not sure how to specify a prefix for the directory while keeping the filename (ie part-0) fixed in copy command. On Apr 16, 2015, at 1:51 PM, Sean Owen so...@cloudera.com wrote: Just copy the files? it shouldn't matter that much where they are as you can

Re: saveAsTextFile

2015-04-16 Thread Sean Owen
You can't, since that's how it's designed to work. Batches are saved in different files, which are really directories containing partitions, as is common in Hadoop. You can move them later, or just read them where they are. On Thu, Apr 16, 2015 at 6:32 PM, Vadim Bichutskiy

Re: saveAsTextFile

2015-04-16 Thread Vadim Bichutskiy
Thanks Evo for your detailed explanation. On Apr 16, 2015, at 1:38 PM, Evo Eftimov evo.efti...@isecc.com wrote: The reason for this is as follows: 1. You are saving data on HDFS 2. HDFS as a cluster/server side Service has a Single Writer / Multiple Reader multithreading

Re: saveAsTextFile extremely slow near finish

2015-03-11 Thread Imran Rashid
is your data skewed? Could it be that there are a few keys with a huge number of records? You might consider outputting (recordA, count) (recordB, count) instead of recordA recordA recordA ... you could do this with: input = sc.textFile pairsCounts = input.map{x = (x,1)}.reduceByKey{_ + _}

Re: saveAsTextFile extremely slow near finish

2015-03-10 Thread Akhil Das
Don't you think 1000 is too less for 160GB of data? Also you could try using KryoSerializer, Enabling RDD Compression. Thanks Best Regards On Mon, Mar 9, 2015 at 11:01 PM, mingweili0x m...@spokeo.com wrote: I'm basically running a sorting using spark. The spark program will read from HDFS,

Re: saveAsTextFile extremely slow near finish

2015-03-10 Thread Sean Owen
This is more of an aside, but why repartition this data instead of letting it define partitions naturally? You will end up with a similar number. On Mar 9, 2015 5:32 PM, mingweili0x m...@spokeo.com wrote: I'm basically running a sorting using spark. The spark program will read from HDFS, sort

Re: saveAsTextFile of RDD[Array[Any]]

2015-02-09 Thread Jong Wook Kim
If you have `RDD[Array[Any]]` you can do rdd.map(_.mkString(\t)) or with some other delimiter to make it `RDD[String]`, and then call `saveAsTextFile`. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-of-RDD-Array-Any-tp21548p21554.html

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Nick Pentreath
Your output folder specifies rdd.saveAsTextFile(s3n://nexgen-software/dev/output); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what is the behaviour? On Tue, Jan 27,

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Chen, Kevin
Subject: Re: SaveAsTextFile to S3 bucket Your output folder specifies rdd.saveAsTextFile(s3n://nexgen-software/dev/output); So it will try to write to /dev/output which is as expected. If you create the directory /dev/output upfront in your bucket, and try to save it to that (empty) directory, what

Re: SaveAsTextFile to S3 bucket

2015-01-26 Thread Ashish Rangole
By default, the files will be created under the path provided as the argument for saveAsTextFile. This argument is considered as a folder in the bucket and actual files are created in it with the naming convention part-n, where n is the number of output partition. On Mon, Jan 26, 2015 at

Re: saveAsTextFile

2015-01-15 Thread ankits
I have seen this happen when the RDD contains null values. Essentially, saveAsTextFile calls toString() on the elements of the RDD, so a call to null.toString will result in an NPE. -- View this message in context:

Re: saveAsTextFile

2015-01-15 Thread Prannoy
Hi, Before saving the rdd do a collect to the rdd and print the content of the rdd. Probably its a null value. Thanks. On Sat, Jan 3, 2015 at 5:37 PM, Pankaj Narang [via Apache Spark User List] ml-node+s1001560n20953...@n3.nabble.com wrote: If you can paste the code here I can certainly

Re: saveAsTextFile just uses toString and Row@37f108

2015-01-13 Thread Reynold Xin
It is just calling RDD's saveAsTextFile. I guess we should really override the saveAsTextFile in SchemaRDD (or make Row.toString comma separated). Do you mind filing a JIRA ticket and copy me? On Tue, Jan 13, 2015 at 12:03 AM, Kevin Burton bur...@spinn3r.com wrote: This is almost funny. I

Re: saveAsTextFile

2015-01-03 Thread Pankaj Narang
If you can paste the code here I can certainly help. Also confirm the version of spark you are using Regards Pankaj Infoshore Software India -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-tp20951p20953.html Sent from the Apache Spark

Re: saveAsTextFile

2015-01-03 Thread Sanjay Subramanian
Subject: Re: saveAsTextFile If you can paste the code here I can certainly help. Also confirm the version of spark you are using Regards Pankaj Infoshore Software India -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/saveAsTextFile-tp20951p20953.html

Re: saveAsTextFile error

2014-11-15 Thread Prannoy
Hi Niko, Have you tried it running keeping the wordCounts.print() ?? Possibly the import to the package *org.apache.spark.streaming._* is not there so during sbt package it is unable to locate the saveAsTextFile API. Go to

Re: saveAsTextFile error

2014-11-14 Thread Harold Nguyen
Hi Niko, It looks like you are calling a method on DStream, which does not exist. Check out: https://spark.apache.org/docs/1.1.0/streaming-programming-guide.html#output-operations-on-dstreams for the method saveAsTextFiles Harold On Fri, Nov 14, 2014 at 10:39 AM, Niko Gamulin

Re: saveAsTextFile makes no progress without caching RDD

2014-09-02 Thread jerryye
As an update. I'm still getting the same issue. I ended up doing a coalesce instead of a cache to get around the memory issue but saveAsTextFile still won't proceed without the coalesce or cache first. -- View this message in context:

Re: saveAsTextFile hangs with hdfs

2014-08-26 Thread Burak Yavuz
Hi David, Your job is probably hanging on the groupByKey process. Probably GC is kicking in and the process starts to hang or the data is unbalanced and you end up with stragglers (Once GC kicks in you'll start to get the connection errors you shared). If you don't care about the list of

Re: saveAsTextFile hangs with hdfs

2014-08-19 Thread evadnoob
update: hangs even when not writing to hdfs. I changed the code to avoid saveAsTextFile() and instead do a forEachParitition and log the results. This time it hangs at 96/100 tasks, but still hangs. I changed the saveAsTextFile to: stringIntegerJavaPairRDD.foreachPartition(p - {

Re: saveAsTextFile hangs with hdfs

2014-08-19 Thread evadnoob
Not sure if this is helpful or not, but in one executor stderr log, I found this: 14/08/19 20:17:04 INFO CacheManager: Partition rdd_5_14 not found, computing it 14/08/19 20:17:04 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/08/19

Re: saveAsTextFile

2014-08-10 Thread durin
This should work: jobs.saveAsTextFile(file:home/hysom/testing) Note the 4 slashes, it's really 3 slashes + absolute path. This should be mentioned in the docu though, I only remember that from having seen it somewhere else. The output folder, here testing, will be created and must therefore