Re: partitioning json data in spark

2015-12-28 Thread Նարեկ Գալստեան
eers > > On Sun, Dec 27, 2015 at 9:28 AM, Նարեկ Գալստեան <ngalsty...@gmail.com> > wrote: > >> >> http://spark.apache.org/docs/1.4.1/api/scala/index.html#org.apache.spark.sql.DataFrameWriter >> I did try but it all was in vain. >> It is also exp

Re: partitioning json data in spark

2015-12-27 Thread Նարեկ Գալստեան
ber...@gmail.com> wrote: > have you tried to specify format of your output, might be parquet is > default format? > df.write().format("json").mode(SaveMode.Overwrite).save("/tmp/path"); > > On 27 December 2015 at 15:18, Նարեկ Գալստեան <ngalsty...@gmail.com> wr

partitioning json data in spark

2015-12-27 Thread Նարեկ Գալստեան
Hey all! I am willing to partition *json *data by a column name and store the result as a collection of json files to be loaded to another database. I could use spark's built in *partitonBy *function but it only outputs in parquet format which is not desirable for me. Could you suggest me a way

Re: Debug Spark

2015-11-29 Thread Նարեկ Գալստեան
A question regarding the topic, I am using Intellij to write spark applications and then have to ship the source code to my cluster on the cloud to compile and test is there a way to automatise the process using Intellij? Narek Galstyan Նարեկ Գալստյան On 29 November 2015 at 20:51, Ndjido Ardo

Re: get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

2015-10-27 Thread Նարեկ Գալստեան
move files matching your pattern to a staging location and > then load them using sc.textFile. you should find hdfs file system calls > that are equivalent to normal file system if command line tools like distcp > or mv don't meet your needs. > On 27 Oct 2015 1:49 p.m., "Նարեկ Գալստեան

get directory names that are affected by sc.textFile("path/to/dir/*/*/*.js")

2015-10-27 Thread Նարեկ Գալստեան
Dear Spark users, I am reading a set of json files to compile them to Parquet data format. I am willing to mark the folders in some way after having read their contents so that I do not read it again(e.g. I can changed the name of the folder). I use .textFile("path/to*/dir/*/*/*.js") *technique

Interactively search Parquet-stored data using Spark Streaming and DataFrames

2015-09-28 Thread Նարեկ Գալստեան
I have significant amount of data stored on my Hadoop HDFS as Parquet files I am using Spark streaming to interactively receive queries from a web server and transform the received queries into SQL to run on my data using SparkSQL. In this process I need to run several SQL queries and then return