Re: Write only one output file in Spark SQL

KhajaAsmath Mohammed Fri, 11 Aug 2017 10:10:55 -0700

tempTable = union_df.registerTempTable("tempRaw")

create = hc.sql('CREATE TABLE IF NOT EXISTS blab.pyspark_dpprq (vin string,
utctime timestamp, description string, descriptionuom string, providerdesc
string, dt_map string, islocation string, latitude double, longitude
double, speed double, value string)')


insert = hc.sql('INSERT OVERWRITE TABLE blab.pyspark_dpprq SELECT * FROM
tempRaw')




On Fri, Aug 11, 2017 at 11:00 AM, Daniel van der Ende <
daniel.vandere...@gmail.com> wrote:

> Hi Asmath,
>
> Could you share the code you're running?
>
> Daniel
>
> On Fri, 11 Aug 2017, 17:53 KhajaAsmath Mohammed, <mdkhajaasm...@gmail.com>
> wrote:
>
>> Hi,
>>
>>
>>
>> I am using spark sql to write data back to hdfs and it is resulting in
>> multiple output files.
>>
>>
>>
>> I tried changing number spark.sql.shuffle.partitions=1 but it resulted
>> in very slow performance.
>>
>>
>>
>> Also tried coalesce and repartition still the same issue. any suggestions?
>>
>>
>>
>> Thanks,
>>
>> Asmath
>>
>

Re: Write only one output file in Spark SQL

Reply via email to