Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-25 Thread Steve Loughran
have the first path to be something like .csv("file://home/user/dataset/data.csv") If you working with files that big -don't use the inferSchema option, as that will trigger two scans through the data -try with a smaller file first, say 1MB or so Trying to use spark *or any other tool* to

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-24 Thread vr spark
Hi, The source file i have is on local machine and its pretty huge like 150 gb. How to go about it? On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran wrote: > > On 19 Nov 2016, at 17:21, vr spark wrote: > > Hi, > I am looking for scala or python

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-20 Thread Steve Loughran
On 19 Nov 2016, at 17:21, vr spark > wrote: Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2.

covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-19 Thread vr spark
Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2. convert to orc 3. store on distributed cloud storage thanks VR